Compare commits

...

2067 Commits

Author SHA1 Message Date
5e6eb7e19d Merge pull request #4788 from gyuho/release_doc
Documentation: release with project key
2016-03-18 09:49:54 -07:00
3c1dbdef64 Merge pull request #4801 from gyuho/version_2_3
version: bump to 2.3
2016-03-18 09:46:07 -07:00
7d5cedbf34 version: bump to 2.3 2016-03-18 09:44:54 -07:00
a57712f105 Merge pull request #4800 from gyuho/version
Documentation: update ROADMAP with version 2.3
2016-03-18 09:43:39 -07:00
165c164780 Documentation: update ROADMAP with version 2.3 2016-03-18 09:41:16 -07:00
7b77984c99 Merge pull request #4782 from gyuho/upgrade_to_2.3
Documentation: upgrade to 2.3
2016-03-18 09:38:54 -07:00
f59a866fcb Merge pull request #4795 from xiang90/table
etcdctlv3: make member list beautiful
2016-03-17 16:10:35 -07:00
56d7899c20 etcdctlv3: member list now supports json 2016-03-17 15:54:44 -07:00
c37cd1f0a1 etcdctlv3: make member list beautiful 2016-03-17 14:49:53 -07:00
58c8521920 Merge pull request #4792 from heyitsanthony/snip-snip
clientv3: break etcdserver dependency
2016-03-17 12:24:54 -07:00
a001651bc1 clientv3: remove dependency on lease package 2016-03-17 11:52:34 -07:00
44753594ec v3rpc: move errors to v3rpc/rpctypes
Fixes #4771
2016-03-17 11:52:34 -07:00
ef135d52ad Merge pull request #4793 from gyuho/proxy_doc
Documentation: discovery-fallback flag to proxy.md
2016-03-17 11:37:25 -07:00
ded41fa16e Documentation: discovery-fallback flag to proxy.md 2016-03-17 11:33:49 -07:00
f491110c5b test: check clientv3 has no dependency on etcdserver or storage packages 2016-03-17 11:31:14 -07:00
3e6678a1f6 Documentation: release doc using etcd project key 2016-03-17 11:22:09 -07:00
179dc72fa7 Merge pull request #4791 from xiang90/l
integration: add TestV3LeaseFailover test
2016-03-17 10:47:32 -07:00
e6c39108a7 integration: add TestV3LeaseFailover test 2016-03-17 10:17:51 -07:00
cc9aabda05 Merge pull request #4783 from xiang90/fix_4753
raft: add optimization notes
2016-03-17 09:58:31 -07:00
f5e60c0e18 raft: add optimization notes 2016-03-17 09:53:50 -07:00
5d256b7b86 rafthttp: pause peer should also pause its reader 2016-03-17 09:43:02 -07:00
0207f3986e Merge pull request #4787 from coreos/xiang90-patch-1
doc: remove exp note for auth api
2016-03-16 19:04:44 -07:00
c73e02354d doc: remove exp note for auth api 2016-03-16 18:51:14 -07:00
e97bb3f3a1 Documentation: upgrade to 2.3 2016-03-16 12:34:33 -07:00
6b65c0f786 Merge pull request #4760 from heyitsanthony/clientv3-stm
clientv3/concurrency: STM
2016-03-16 11:44:12 -07:00
bc37a32062 clientv3/concurrency: software transactional memory
Repeatable read and serialized read STM implementations.
2016-03-16 11:23:06 -07:00
724bab110d Merge pull request #4781 from gyuho/csv
bench/cmd: print csv-format timeseries
2016-03-16 11:14:39 -07:00
58792ca59b bench/cmd: print csv-format timeseries 2016-03-16 11:06:36 -07:00
a2569c77de Merge pull request #4779 from xiang90/cq
etcdserver: leader should stepdown when lose quorum for v3
2016-03-16 09:09:05 -07:00
4610faf15e Merge pull request #4777 from mqliang/member-leaderinfo
show leader/member information when run 'etcdctl member list' command
2016-03-15 23:26:46 -07:00
2a28ac7ad4 etcdserver: leader should stepdown when lose quorum for v3 2016-03-15 23:23:26 -07:00
a7f6dc6c0d Merge pull request #4778 from xiang90/lease_promote
*: refresh the lease TTL correctly when a leader is elected.
2016-03-15 22:52:33 -07:00
d8c315ab55 etcdctl: show leader/member information when run 'etcdctl member list' command
leader/follower information is very helpful for debugging. User must get this information through http endpoint
before this patch, which is inconvenient.
2016-03-16 13:48:37 +08:00
e9a0a103e5 *: refresh the lease TTL correctly when a leader is elected.
The new leader needs to refresh with an extened TTL to gracefully handle
the potential concurrent leader issue. Clients might still send keep alive
to old leader until the old leader itself gives up leadership at most after
an election timeout.
2016-03-15 22:40:03 -07:00
c7a7dfdc9a Merge pull request #4772 from xiang90/bk
storage: support seq put to make db more compact
2016-03-15 16:48:12 -07:00
d883d5acd3 storage: support seq put to make db more compact 2016-03-15 16:36:36 -07:00
b56c2e017c Merge pull request #4770 from hongchaodeng/master
refactor recorder-based mock objects into separate packages
2016-03-15 15:53:36 -07:00
dcaf5ef586 move store recorder to 'mock/mockstore' 2016-03-15 15:41:07 -07:00
bae053f57f Merge pull request #4768 from xiang90/ctl
etcdctlv3: get support fromkey
2016-03-15 12:28:34 -07:00
a24aade667 etcdctlv3: add from-key flag for get command 2016-03-15 12:08:54 -07:00
6a6fe452ee Merge pull request #4766 from gyuho/sample
benchmark: move sample flag to root command
2016-03-15 11:09:00 -07:00
c9e4e2b6dc benchmark: move sample flag to root command
Sample is configuration for reports. This should be
flag at top command.
2016-03-15 10:36:27 -07:00
323687e1f1 Merge pull request #4763 from gyuho/real_latency
benchmark: printSecondSample with time series
2016-03-15 08:39:38 -07:00
5eefff12e1 benchmark: printSecondSample with time series 2016-03-15 08:35:03 -07:00
1d4febd16e Merge pull request #4764 from siddontang/master
etcdctlv3/command: add with-prefix flag for get
2016-03-14 22:22:41 -07:00
683274b201 etcdctlv3/command: add prefix flag for get 2016-03-15 12:54:13 +08:00
2aa635ea39 Merge pull request #4761 from heyitsanthony/txn-noninteractive
etcdctlv3: define non-interactive txn format to match interactive input
2016-03-14 14:04:18 -07:00
d1ece7d621 Merge pull request #4762 from gyuho/typo
benchmark: minor typos
2016-03-14 13:56:32 -07:00
a932674a5b benchmark: minor typos 2016-03-14 13:45:08 -07:00
4bbbb52892 etcdctlv3: define non-interactive txn format to match interactive input
Fixes #4559
2016-03-14 13:36:18 -07:00
58c92ee41b Merge pull request #4758 from xiang90/sync
wal: do not call fsync when it is not necessary
2016-03-14 11:58:16 -07:00
53e7ddbc66 wal: do not call fsync when it is not necessary 2016-03-14 11:52:06 -07:00
3b1aa1c93d Merge pull request #4714 from mitake/v3-auth-enable
etcdserver, storage: new storage interface AuthStore
2016-03-13 22:11:04 -07:00
4eb1cfd658 etcdserver, auth: new package auth for the auth feature
This commit adds a new package auth. Its role is persisting auth
related metadata. This commit also connects its main interface
AuthStore and v3 server.
2016-03-14 13:57:41 +09:00
de03dbf632 Merge pull request #4757 from xiang90/mm
clientv3: fix mirror and add integration test
2016-03-13 18:42:19 -07:00
fc49eeccae Merge pull request #4756 from coreos/xiang90-patch-1
Readme: adding gocard
2016-03-13 18:41:04 -07:00
22065fe334 clientv3: fix mirror and add integration test 2016-03-13 18:27:45 -07:00
25612169dd Readme: adding gocard
Gocard includes most of the go style checking tools that we care about. This is a good indicator for us.
2016-03-13 18:14:00 -07:00
a343b800b5 Merge pull request #4754 from xiang90/fdatasync
wal: support fadatasync on linux
2016-03-13 18:07:43 -07:00
e59efe45a1 wal: support fadatasync on linux 2016-03-13 17:22:53 -07:00
bab8f43cb0 Merge pull request #4726 from ajityagaty/mk_in_order
etcdctl: Add command to create in-order keys.
2016-03-13 10:15:42 -07:00
7a1fbefb2c Merge pull request #4753 from xiang90/leader_par
etcdserver: leader latency optimization
2016-03-13 09:55:40 -07:00
0f9d04237c etcdserver: leader latency optimization 2016-03-12 22:51:13 -08:00
c0a424c1d0 Merge pull request #4752 from aboyett/auth-doc-fix
authentication.md: fix formatting of fenced code block
2016-03-11 22:34:48 -08:00
3a3ddaa195 authentication.md: fix formatting of fenced code block
Add a missing blank line to one of the fenced code blocks in the
"Working with roles" section to conform with the formatting best
practice recommended in the GitHub guide "Creating and highlighting code
blocks"[1]:

> We recommend placing a blank line before and after code blocks to make
> the raw formatting easier to read.

The missing blank line prior to the code block causes incorrect HTML
rendering of this section within the CoreOS etcd Authentication
Guide[2]. This commit fixes the problem, but the underlying issue seems
to be a quirk of the markdown render used for the CoreOS documentation
pages, as the same markdown source does not exhibit the issue when
viewed on GitHub[3] or viewed through the python markdown module.

[1] https://help.github.com/articles/creating-and-highlighting-code-blocks/
[2] https://coreos.com/etcd/docs/2.2.5/authentication.html
[3] https://github.com/coreos/etcd/blob/v2.2.5/Documentation/authentication.md
2016-03-11 18:44:39 -08:00
d0a780f248 Merge pull request #4751 from xiang90/doc
doc: fix refresh section in api.md
2016-03-11 09:51:35 -08:00
84d2c1e7b0 doc: fix refresh section in api.md 2016-03-11 09:49:25 -08:00
ef0c8e5c18 etcdctl: Add an option to mk command to create in-order keys.
Adding a new option to the mk command to create in-order keys
under a given directory, identified by <key> argument. Optionally
TTL can also be set on the newly created in-order key.
2016-03-10 20:28:28 -08:00
9f2061ce71 Merge pull request #4748 from xiang90/t
etcdserver: remove todo
2016-03-10 19:41:02 -08:00
c6192d1d7a etcdserver: remove todo 2016-03-10 19:10:20 -08:00
47848bc2aa Merge pull request #4747 from xiang90/re
*: recover lessor when applying snapshot
2016-03-10 19:04:04 -08:00
adcba975cb *: recover lessor when applying snapshot 2016-03-10 17:06:58 -08:00
ae7e2c3203 Merge pull request #4743 from xiang90/doc
doc: add v3 data model
2016-03-10 15:36:33 -08:00
6a3e8d4bde doc: add v3 data model 2016-03-10 15:16:05 -08:00
cd6c1190b5 Merge pull request #4739 from heyitsanthony/e2e-tempdir
e2e: put etcd datadir in golang tempdir
2016-03-10 11:58:03 -08:00
bb45bb84c4 Merge pull request #4742 from xiang90/update_b
godeps: update boltdb dependency
2016-03-10 11:55:11 -08:00
b353372090 godeps: update boltdb dependency 2016-03-10 11:45:04 -08:00
4a6c06db13 e2e: put etcd datadir in golang tempdir
The command "TMPDIR=/mnt/myramdisk/etcd go test -v" was making data
directories in pwd instead of the tmpdir.
2016-03-10 11:12:08 -08:00
534a16f2a2 Merge pull request #4741 from xiang90/defrag_c
*: add client.defrag and defrag cmd for etcdctl
2016-03-10 10:29:01 -08:00
7c377fa703 *: add client.defrag and defrag cmd for etcdctl 2016-03-10 09:30:41 -08:00
0c41371064 Merge pull request #4738 from xiang90/shrink_rpc
etcdserver: add maintain service to support defrag
2016-03-09 22:50:59 -08:00
2f12ea893b etcdserver: add maintain service to support defrag 2016-03-09 22:29:21 -08:00
a243f80c80 Merge pull request #4736 from heyitsanthony/fix-setdir-help
etcdctl: clarify setdir purpose in help message
2016-03-09 14:52:40 -08:00
edafbe9555 etcdctl: clarify setdir purpose in help message
Fixes #4640
2016-03-09 14:38:57 -08:00
6343410e24 Merge pull request #4735 from xiang90/fix_write
clientv3/integration: fix TestTxnWriteFail
2016-03-09 14:30:30 -08:00
3886129f52 clientv3/integration: fix TestTxnWriteFail
It might take client request more than dialtimeout to fail when
we kill the connection when the client is sending request.
2016-03-09 14:03:51 -08:00
6650db53a4 Merge pull request #4733 from heyitsanthony/backend-alignment
storage/backend: align fields used for atomic ops
2016-03-09 11:42:55 -08:00
dd01ab6dc0 storage/backend: align fields used for atomic ops
Fixes crashes on 32-bit tests.
2016-03-09 11:17:27 -08:00
19e39a36f7 Merge pull request #4732 from xiang90/i_future_watch
integration: fix TestV3WatchFutureRevision
2016-03-09 10:20:52 -08:00
73c8dbd459 Merge pull request #4731 from xiang90/backend
backend: fix TestBackendClose by giving more time to wait for io
2016-03-09 10:12:20 -08:00
39d307572e integration: fix TestV3WatchFutureRevision
Fix https://github.com/coreos/etcd/issues/4730.

Previously we put keys async and there might be a race when
the watch triggers before the put receives the response. When that
happens, put might fails to get the response since we shutdown the server
when watch triggers.
2016-03-09 09:55:52 -08:00
86e43b4173 Merge pull request #4729 from xiang90/raft_fix
raft: remove unnecessary waitSchedule in test
2016-03-09 09:45:20 -08:00
6cf4c4b3fd backend: fix TestBackendClose by giving more time to wait for io 2016-03-09 09:44:16 -08:00
f72be5487c Merge pull request #4728 from gyuho/util
pkg/fileutil: clean up interface, comments
2016-03-09 10:35:50 -07:00
22a8bbd3b1 pkg/fileutil: clean up interface, comments 2016-03-09 09:19:56 -08:00
aa59e7518e raft: remove unnecessary waitSchedule in test 2016-03-09 09:18:49 -08:00
a4624666fe Merge pull request #4720 from heyitsanthony/fix-testremovemember
clientv3/integration: do not remove client member in TestMemberRemove
2016-03-08 15:22:33 -08:00
6dd53ec699 Merge pull request #4721 from heyitsanthony/build-scary-archs
*: build phony etcd server binary for unsupported architectures
2016-03-08 14:04:02 -08:00
56fcee7ab0 Merge pull request #4718 from xiang90/v3_api
doc: add v3 api doc
2016-03-08 13:55:36 -08:00
3f83739a51 doc: add v3 api doc 2016-03-08 13:15:06 -08:00
5f304b4dee *: build phony etcd server binary for unsupported architectures
We don't qualify etcdserver for anything other than amd64, so don't
build binaries that are untested and might be unreliable.
2016-03-08 13:12:11 -08:00
e026b79c87 clientv3/integration: do not remove client member in TestMemberRemove
Fixes #4716
2016-03-08 11:55:26 -08:00
f74d732d77 Merge pull request #4717 from joshix/portsnote
README: Add explanation about official vs legacy ports
2016-03-08 10:57:29 -08:00
02975afe37 README: Add explanation about official vs legacy ports
Add a note about 2379/2380 vs 4001/7001. I believe this is as
far as we can go just yet, because we have not fully deprecated
the old ports, and still want to configure them in some instances.

Related #3480.
2016-03-08 10:33:11 -08:00
570775fe61 Merge pull request #4710 from heyitsanthony/clientv3-tlsconfig
clientv3: use tls.Config in clientv3.Config
2016-03-07 16:37:24 -08:00
78132c9b5b clientv3: use tls.Config in clientv3.Config
Fixes #4648
2016-03-07 16:08:40 -08:00
8e203b7bd5 Merge pull request #4711 from heyitsanthony/nuke-invfuturerev-test
clientv3/integration: remove invalid future revision test
2016-03-07 16:07:53 -08:00
969e42c3fa Merge pull request #4704 from gyuho/print_rate
benchmark: change complete notifier first
2016-03-07 16:00:48 -08:00
4eeea5ccda clientv3/integration: remove invalid future revision test
Future revisions are now supported, so test is outdated.
2016-03-07 15:52:34 -08:00
b8912c9fb1 benchmark: change complete notifier first
Fix https://github.com/coreos/etcd/issues/4708.
2016-03-07 14:54:11 -08:00
e29ed898d4 Merge pull request #4707 from heyitsanthony/fix-newmember-comment
etcdserver: update NewMember comment to reflect how id is computed
2016-03-07 14:20:52 -08:00
c666a11f38 etcdserver: update NewMember comment to reflect how id is computed
Fixes #4573
2016-03-07 14:09:53 -08:00
632461cc50 Merge pull request #4706 from heyitsanthony/fix-client-close-deadlock
clientv3: don't deadlock on Close with broken connection
2016-03-07 14:02:12 -08:00
d21d2e6624 clientv3: don't deadlock on Close with broken connection
Fixes #4679
2016-03-07 13:46:54 -08:00
13f853f59b Merge pull request #4705 from xiang90/benchmark
doc: add benchmark result for 2.2
2016-03-07 12:10:52 -08:00
4e6289b829 doc: add benchmark result for 2.2 2016-03-07 12:00:42 -08:00
e9be0415fd Merge pull request #4703 from xiang90/future_watch
*: support watch from future revision
2016-03-07 11:31:12 -08:00
a5c83e4c3e Merge pull request #4696 from heyitsanthony/fix-e2e-quorum
e2e: quorum cleanup
2016-03-07 11:18:15 -08:00
036ed87c6d *: support watch from future revision 2016-03-07 10:57:30 -08:00
b9228180f1 Merge pull request #4701 from gyuho/typo
storage: fix minor typo
2016-03-06 17:07:02 -08:00
50fd9d3b57 storage: fix minor typos 2016-03-06 17:05:02 -08:00
d0f6f49085 e2e: quorum cleanup
If a test gets data without quorum, it should only have one node in the
test cluster to avoid reading stale data.

Fixes #4694
2016-03-05 20:16:07 -08:00
1d15c7dd86 Merge pull request #4693 from heyitsanthony/etcdctl-fix-lists
etcdctl: correctly unmarshal list users and list roles
2016-03-05 20:04:13 -08:00
9809da95da client: correctly unmarshal roles in ListRoles 2016-03-05 19:42:58 -08:00
2a0d64bb4a client: correctly unmarshal users in ListUsers 2016-03-05 19:42:58 -08:00
a31f84121b Merge pull request #4699 from heyitsanthony/fix-barrier
storage: use creation revision to compute txn event types
2016-03-05 19:17:56 -08:00
2868c5587a Merge pull request #4698 from xiang90/fix_w
*: fix watch full key range
2016-03-05 19:10:53 -08:00
713f7c056f storage: use creation revision to compute txn event types
Fixes #4688
2016-03-05 19:03:07 -08:00
633a0bdf55 integration: add test for full range watching 2016-03-05 18:52:41 -08:00
d84811aecf *: fix watch full key range 2016-03-05 14:45:43 -08:00
e93e41cd9a Merge pull request #4643 from gyuho/stress
functional-tester: less intensive stresser
2016-03-05 13:17:19 -08:00
b3f0bcbeb4 functional-tester: less intensive stresser 2016-03-05 13:16:26 -08:00
b83b5307d1 Merge pull request #4695 from gyuho/endpoints
e2e: use endpoints flag
2016-03-05 13:00:00 -08:00
07030d59dd e2e: use endpoints flag 2016-03-05 12:28:31 -08:00
ca05f55a21 Merge pull request #4690 from heyitsanthony/fix-etcdctl-fullroles
client: unmarshal user with full roles into user with role names
2016-03-05 01:31:28 -08:00
e708bc14d7 Merge pull request #4689 from hongchaodeng/master
watch.go: docs on WatchResponse.Canceled
2016-03-04 16:09:16 -08:00
0926a91179 e2e: test granting roles to users 2016-03-04 16:09:01 -08:00
868728e5b5 client: if User unmarshal fails, decode user using full roles
Fixes #3702
2016-03-04 16:02:35 -08:00
5e017e94f9 watch.go: docs on WatchResponse.Canceled 2016-03-04 15:51:00 -08:00
104a3bdc01 Merge pull request #4687 from gyuho/example
clientv3: add IsProgressNotify with example
2016-03-04 15:46:35 -08:00
02d9aa481b etcdctl: accept user:pass format for add user
Otherwise needed an interactive terminal to create a user.
2016-03-04 13:29:06 -08:00
450b586011 clientv3: add IsProgressNotify with example
This makes the test endpoints same as we have in
goreman proc file, and adds 'IsProgressNotify' method
and the 'WatchProgressNotify' code example to godoc.
2016-03-04 12:16:43 -08:00
b73c1223d8 Merge pull request #4638 from gyuho/compact
benchmark: add auto-compact flags
2016-03-04 10:15:02 -08:00
aca29605b5 Merge pull request #4675 from gyuho/bench_watch
benchmark: watch, key-space-size(max possible key)
2016-03-04 10:14:42 -08:00
4097a72c0b Merge pull request #4678 from gyuho/watch_notify_clientv3
clientv3: add WithProgressNotify
2016-03-04 10:08:25 -08:00
ead5d432a3 Merge pull request #4682 from heyitsanthony/clientv3-clientctx
clientv3: include a context in Client
2016-03-04 09:48:37 -08:00
27316196d8 clientv3: add WithProgressNotify
Client side for https://github.com/coreos/etcd/issues/4628.
2016-03-04 09:47:13 -08:00
da0f77dc14 benchmark: measure Put with auto-compact 2016-03-04 09:34:24 -08:00
06f950f614 Merge pull request #4683 from heyitsanthony/recipes-err-handling
contrib/recipes: fix nil dereferences on error paths
2016-03-04 09:32:27 -08:00
360aafec76 clientv3: include a context in Client
Useful for clean up tasks
2016-03-04 09:20:44 -08:00
39109b8b53 contrib/recipes: fix nil dereferences on error paths 2016-03-04 00:38:35 -08:00
3b185f130a Merge pull request #4680 from xiang90/test_l
integration: add TestV3PutOnNonExistLease
2016-03-03 21:31:51 -08:00
44151ba531 integration: add TestV3PutOnNonExistLease 2016-03-03 20:32:44 -08:00
b0a88ab287 Merge pull request #4677 from heyitsanthony/clientv3-wr-err
clientv3: add Err() to WatchResponse
2016-03-03 15:42:53 -08:00
1e16758029 clientv3: add Err() to WatchResponse
Checking for number of events as a failure condition was a kludge.
2016-03-03 15:21:04 -08:00
dc7f9a89b0 Merge pull request #4674 from xiang90/progress
v3api: support progress
2016-03-03 14:28:26 -08:00
3cda514edb Merge pull request #4676 from heyitsanthony/clientv3-fix-cancel-retry
clientv3: do not reconnect if request context is canceled
2016-03-03 14:24:50 -08:00
b1521570b6 v3api: support progress 2016-03-03 13:58:15 -08:00
536b028831 benchmark: watch, key-space-size(max possible key)
By specifying 'key-space-size', we can test min/max-key-range
of keys to watch.

For https://github.com/coreos/etcd/issues/3863.
2016-03-03 13:46:17 -08:00
16c35167df clientv3: do not reconnect on request context cancellation 2016-03-03 13:43:16 -08:00
6746d394bf Merge pull request #4669 from xiang90/rev
storage: implement requestProgress
2016-03-03 10:43:48 -08:00
adbc53e3f7 Merge pull request #4670 from gyuho/tc
pkg/netutil: add linux netem functions
2016-03-03 10:41:11 -08:00
863e354ec5 Merge pull request #4672 from heyitsanthony/fix-mutex-lostwaiter
clientv3/concurrency: don't poll in mutex lock on early prior key delete
2016-03-03 10:35:23 -08:00
4422db389a pkg/netutil: add linux netem functions
This is useful for simulating bad networks by introducing
latencies to wide area networks.

Reference:
http://www.linuxfoundation.org/collaborate/workgroups/networking/netem
2016-03-03 10:14:28 -08:00
6d3f172c6e clientv3/concurrency: don't poll in mutex lock on early prior key delete
Lock would get the prior key on retry using WithRev(myRev - 1) instead of
using the latest revision; the Watch() would return immediately and Lock
devolves into polling.
2016-03-03 10:04:57 -08:00
9143329c85 storage: implement requestProgress 2016-03-03 09:39:29 -08:00
e3b755e9e0 Merge pull request #4655 from heyitsanthony/etcdctl-election
etcdctlv3: election command
2016-03-03 00:55:38 -08:00
20d89bcf32 etcdctlv3: elect command 2016-03-03 00:22:19 -08:00
3327858a54 clientv3/concurrency: move election recipe into clientv3 2016-03-03 00:22:19 -08:00
0eeb663754 Merge pull request #4660 from xiang90/shrink_db
backend: support shrink db
2016-03-02 20:28:51 -08:00
c89d80cb11 Merge pull request #4664 from gyuho/alpha1
version: bump to apha1
2016-03-02 19:44:55 -08:00
679b40bc77 Merge pull request #4663 from gyuho/endpoints
etcdctlv3: use string slice for endpoints
2016-03-02 16:41:49 -08:00
c792885952 version: bump to apha1
For https://github.com/coreos/etcd/issues/4652.
2016-03-02 16:38:49 -08:00
13025679c3 etcdctlv3: use string slice for endpoints
This deprecates 'endpoint' flag to enable etcdctl to parse
multi-endpoints flag.
2016-03-02 16:34:58 -08:00
378949f97c Merge pull request #4658 from mitake/v3-auth-enable
add a stub of etcdctlv3 auth enable
2016-03-02 14:45:11 -08:00
558640d91e backend: support shrink db 2016-03-02 14:35:28 -08:00
3cbd54fb0b etcdctlv3: ignore the binary etcdctlv3 2016-03-02 15:32:35 +09:00
b9d77eaaf2 etcdctlv3: a new command auth
Currently a stub of subcommand "auth enable" is implemented.
2016-03-02 15:32:35 +09:00
379d04ea53 clientv3: a new interface Auth for auth related RPCs 2016-03-02 15:17:59 +09:00
7a78c1ef1d etcdserver: AuthServer for auth related RPCs
Currently AuthEnable() is connected to etcdserver for experimental
purpose.
2016-03-02 15:17:59 +09:00
c7657f60ff Merge pull request #4657 from heyitsanthony/v3-procfile-ports
Procfile: serve the default grpc port 2378
2016-03-01 20:28:33 -08:00
4bcdad3778 Procfile: serve the default grpc port 2378
so v3 commands work using the default endpoint
2016-03-01 20:04:26 -08:00
124a444153 Merge pull request #4654 from gyuho/govet_tip
client: fix go vet error at tip
2016-03-01 15:50:54 -08:00
4ffe823c67 client: fix go vet error at tip 2016-03-01 15:37:05 -08:00
23a8339dd7 Merge pull request #4649 from mitake/v3-auth-proto
etcdserver: update rpc.proto for v3 authentication
2016-03-01 09:47:09 -08:00
46a7ef922d Merge pull request #4650 from peterbourgon/fix-raft-node-config-docs
Fix raft node config docs
2016-03-01 07:48:38 -08:00
aedf2c5876 raft: Config: comment wrapping @ 80col 2016-03-01 09:54:58 +01:00
6c1b3a71db raft: clarify Heartbeat/ElectionTick comments
Avoid other, ambiguous interpretations.
2016-03-01 09:52:14 +01:00
286d96284e etcdserver: update rpc.proto for v3 authentication 2016-03-01 17:11:50 +09:00
f0dbd0b856 Merge pull request #4646 from xiang90/starvation
etcdserver: detect raft stravation
2016-02-29 19:58:17 -08:00
942ae63398 Merge pull request #4647 from heyitsanthony/nuke-timeutil
pkg/timeutil: remove
2016-02-29 17:19:09 -08:00
1d6ebdd35c pkg/timeutil: removal
Overkill of a package for three lines of code.
2016-02-29 17:07:24 -08:00
d6520303c6 etcdserver: detect raft starvation caused by contention 2016-02-29 17:06:57 -08:00
3a9d532140 Merge pull request #4614 from heyitsanthony/future-watch-rpc
etcdserver, storage, clientv3: watcher ranges
2016-02-29 15:59:18 -08:00
b73553db68 Merge pull request #4645 from gyuho/clientv3_README
clientv3: document error handling in README
2016-02-29 16:45:48 -07:00
4bf3756aa8 clientv3: document error handling in README 2016-02-29 15:41:52 -08:00
eb327c690b clientv3: support watcher ranges 2016-02-29 15:20:41 -08:00
c0eac7ab72 storage: support watch on ranges 2016-02-29 15:20:41 -08:00
c0b06a7a32 pkg/adt: interval tree 2016-02-29 15:20:30 -08:00
eb1211bced Merge pull request #4416 from endocode/kayrus/relink
docs: Relink and fix broken links
2016-02-29 14:37:23 -08:00
d0efb3e448 Merge pull request #4644 from gyuho/doc
clientv3: document error handling
2016-02-29 15:31:45 -07:00
8dbc6cfd43 etcdserver: ranges in watcher rpc protocol
protocol change so watch requests are ranges; server rejects non-prefix ranges
2016-02-29 14:03:27 -08:00
a857d8e27c clientv3: document error handling 2016-02-29 13:54:35 -08:00
4fb25d5f0e Merge pull request #4613 from heyitsanthony/clientv3-composite
clientv3: compose API interfaces into client struct
2016-02-29 11:23:34 -08:00
7bd850a65c Merge pull request #4639 from xiang90/grpc
*: update gRPC dependency
2016-02-27 21:47:15 -08:00
6a99d967fa *: update gRPC dependency 2016-02-27 21:19:40 -08:00
18a5b88a32 Merge pull request #4635 from gyuho/endpoints
benchmark: use endpoints for benchmark flag
2016-02-26 18:13:52 -07:00
64e276800f benchmark: use endpoints for benchmark flag 2016-02-26 16:55:49 -08:00
df93f5cb5d Merge pull request #4632 from gyuho/watchid
*: return -1 for canceled watch request
2016-02-26 15:36:28 -07:00
4a0a83380e *: return -1 for canceled watch request 2016-02-26 14:26:46 -08:00
2ec138b160 Merge pull request #4630 from heyitsanthony/clientv3-watcher-closecancel
clientv3: return closed channel on Watch() cancel
2016-02-26 13:39:17 -08:00
1528a66808 Merge pull request #4631 from msingle/patch-1
client: minor typo fix
2016-02-26 12:18:43 -08:00
d02b1c982f clientv3: return closed channel on Watch() cancel
was returning nil; difficult to use correctly

Fixes #4626
2016-02-26 12:16:41 -08:00
7b57550484 client: minor typo fix 2016-02-26 15:16:00 -05:00
9f92da1913 Merge pull request #4566 from xiang90/com
etcdctlv3: Compatibility Support
2016-02-26 11:56:40 -08:00
3d507419a2 etcdctlv3: add compatibility section 2016-02-26 11:03:40 -08:00
f8c3fa637f clientv3: use default client lease api 2016-02-25 18:13:26 -08:00
8f7d474a6b clientv3: use default client cluster 2016-02-25 18:13:26 -08:00
3e57bbf317 clientv3: use default client kv 2016-02-25 18:13:26 -08:00
d430c7baf7 clientv3: use default client watcher 2016-02-25 18:13:26 -08:00
298c1e2487 tools/benchmark: port to clientv3 API 2016-02-25 18:13:26 -08:00
5f62c05a6d clientv3: compose all clientv3 APIs into client struct 2016-02-25 18:13:26 -08:00
bfcd39335c Merge pull request #4619 from heyitsanthony/clientv3-do
clientv3: expose Do in KV
2016-02-25 18:11:29 -08:00
43689b9a32 clientv3: expose Do in KV
Do() makes it possible to pass Ops around and apply them later.

Txn().Then(op).Commit() isn't enough because it will wrap the op
in a txn. Likewise, rewriting single op txns into single op rpc's
precludes deliberately submitting a single op transaction.
2016-02-25 17:33:47 -08:00
21649afcd4 Merge pull request #4623 from heyitsanthony/clientv3-fix-lease-panic
clientv3: respect first stream error in lease recv loop
2016-02-25 17:27:40 -08:00
993fd76b19 clientv3: respect first stream error in lease recv loop
Fixes #4622
2016-02-25 16:59:08 -08:00
9e493ccb14 Merge pull request #4621 from xiang90/auto-compaction
*: support time based auto compaction.
2016-02-25 16:19:00 -08:00
d265fe000c *: support time based auto compaction.
Fix https://github.com/coreos/etcd/issues/3906.

We will have extensive doc to talk about what is compaction
and what is auto compaction soon.
2016-02-25 16:02:03 -08:00
762639c2a3 Merge pull request #4618 from hongchaodeng/master
kv.proto: docs fix on watch event of DELETE/EXPIRE
2016-02-25 14:46:24 -08:00
6d32e44b0c kv.proto: docs fix on watch event of DELETE/EXPIRE 2016-02-25 13:51:38 -08:00
6a97c967cc Merge pull request #4617 from gyuho/txn_example
clientv3: fix txn example code
2016-02-25 12:22:34 -07:00
eb95bb2db9 Merge pull request #4611 from xiang90/p_s
doc/security.md: add notes for proxy security
2016-02-25 10:59:31 -08:00
81f77ee4f3 clientv3: fix txn example code 2016-02-25 10:52:47 -08:00
2510cacd47 docs: Relink and fix broken links 2016-02-25 12:42:11 +01:00
8f3981c651 Merge pull request #4612 from gyuho/watch_not_panic
*: watch true cancel, created for wrong rev
2016-02-24 22:07:03 -07:00
a78604dacb *: watch true cancel, created for wrong rev
This sets Created and Cancel true in pb.WatchResponse
when it has received wrong start revision instead of
panic. So that clientv3 can set 'Canceled' in WatchResponse
as well.

Fix https://github.com/coreos/etcd/issues/4610.
2016-02-24 20:56:17 -08:00
a120afb142 doc/security.md: add notes for proxy security 2016-02-24 20:22:54 -08:00
f003ce167a Merge pull request #4604 from heyitsanthony/etcdctl-lock
etcdctlv3: lock command
2016-02-24 20:02:51 -08:00
6053480b75 Merge pull request #4580 from AdoHe/etcdctl_use_endpoints
etcdctl: use endpoints instead of endpoint flag
2016-02-24 19:01:42 -08:00
d8a8116f9a etcdctl: use endpoints instead of endpoint flag 2016-02-24 21:49:50 -05:00
ed44bb00f8 etcdctlv3: lock command 2016-02-24 17:23:40 -08:00
d4b2044eb1 clientv3/concurrency: Mutex 2016-02-24 17:23:40 -08:00
20b4336cdb clientv3/concurrency: Session
A client may bind itself to a session lease to signal its
continued in participation with the cluster.
2016-02-24 16:40:16 -08:00
9f569842f0 clientv3: move syncer to mirror package
to be in line with sync meaning process synchronization, not data
synchronization
2016-02-24 14:21:41 -08:00
24ac3d0222 Merge pull request #4607 from endocode/kayrus/remove_dashes
docs: replaced single dashes in command line configuration
2016-02-24 09:46:21 -08:00
bbdff697db Merge pull request #4605 from heyitsanthony/fixup-godocs
*: fix godoc bugs in interfaces and slice fields
2016-02-24 08:53:33 -08:00
1e295d02cf docs: replaced single dashes in command line configuration
sed -i 's/\([^-a-z]\+\)\(-[a-z]\{2,\}\)/\1-\2/g' configuration.md
2016-02-24 17:28:36 +01:00
afa0368dcc *: fix godoc bugs in interfaces and slice fields
detected with goword
2016-02-24 00:45:40 -08:00
783e6f6b0d Merge pull request #4602 from gyuho/watch_option
*: combine Watch, WatchPrefix with variadic function
2016-02-23 20:26:54 -08:00
8f7948641c contrib/recipes: replace WatchPrefix with Watch 2016-02-23 20:02:24 -08:00
7a7751b994 etcdctlv3: combine Watch, WatchPrefix 2016-02-23 20:02:24 -08:00
a24d276891 clientv3: combine Watch, WatchPrefix with variadic
For https://github.com/coreos/etcd/issues/4598.
2016-02-23 20:02:21 -08:00
53f94c22b3 Merge pull request #4600 from gyuho/opoption_doc
clientv3: add GoDoc to OpOption
2016-02-23 15:36:38 -08:00
e22506cbe4 Merge pull request #4601 from hongchaodeng/master
clientv3: kv.Delete typo
2016-02-23 15:25:11 -08:00
5bc08b7475 clientv3: add GoDoc to OpOption 2016-02-23 15:18:33 -08:00
a19b30b7ab clientv3: kv.Delete typo 2016-02-23 15:07:10 -08:00
2403cdc4c0 Merge pull request #4592 from gyuho/doc
clientv3: add more code examples
2016-02-23 14:06:09 -08:00
72a1e5618b clientv3: add more code examples 2016-02-23 14:05:36 -08:00
86e04d5ff3 Merge pull request #4597 from heyitsanthony/etcdctlv3-format
etcdctlv3: add formatting for json and protobuf
2016-02-23 13:08:19 -08:00
386c64be7f etcdctlv3: protobuf formatting 2016-02-23 12:49:37 -08:00
355896b00a etcdctlv3: json formatting 2016-02-23 12:49:37 -08:00
8302f839b6 etcdctlv3: add printer interface and simple printer 2016-02-23 09:56:57 -08:00
410d32a9b1 Merge pull request #4594 from xiang90/wi
etcdctlv3: make watch interactive mode better
2016-02-22 23:41:26 -08:00
a2902c08a9 etcdctlv3: make watch interactive mode better 2016-02-22 23:06:31 -08:00
c48bafae85 Merge pull request #4591 from heyitsanthony/etcdctlv3-nuke-teletypes
etcdctlv3: use "\n" as output line separator
2016-02-22 22:19:26 -08:00
4295d0db8b etcdctlv3: use "\n" as output line separator 2016-02-22 15:45:56 -08:00
d8b124cf3a Merge pull request #4571 from heyitsanthony/txn-interactive
etcdctlv3: improve txn interactive mode
2016-02-22 15:33:44 -08:00
d139788407 Merge pull request #4575 from aknuds1/fix-clustering-docs
Update clustering documentation for etcd2
2016-02-22 14:58:16 -08:00
b280291f9e etcdctlv3: use --interactive for interactive mode in watch 2016-02-22 14:38:23 -08:00
87dcb2adea etcdctlv3: unify txn interactive mode input with get/put/delete 2016-02-22 14:38:23 -08:00
e5b2247f4f doc: Use double-dash switches 2016-02-22 16:35:18 -06:00
d67192c9f4 Merge pull request #4590 from gyuho/clientv3_doc
clientv3: README, GoDoc examples
2016-02-22 14:24:59 -08:00
f0686189e5 clientv3: README, GoDoc examples 2016-02-22 14:21:36 -08:00
54d15256e7 etcdctlv3: use clientv3 api for txn command 2016-02-22 09:51:59 -08:00
137dc1f795 Merge pull request #4585 from heyitsanthony/fix-testtriggersnap
etcdserver: fix race in TestTriggerSnap
2016-02-21 22:19:20 -08:00
a524d5bdb7 etcdserver: fix race in TestTriggerSnap
Fixes #4584
2016-02-21 22:03:35 -08:00
55c3cf3ce6 Merge pull request #4582 from heyitsanthony/godoc-ci
*: automate checking for broken exported godocs
2016-02-21 21:27:06 -08:00
c5b51946eb *: exported godoc fixups 2016-02-21 20:36:44 -08:00
f6d8059ac1 test: scan for exported godoc violations 2016-02-21 20:36:44 -08:00
67472d1ee0 Merge pull request #4583 from gyuho/delete
*: return the number of deleted keys
2016-02-21 18:11:13 -08:00
fc86e1ded1 *: return the number of deleted keys
For https://github.com/coreos/etcd/issues/4576.
2016-02-21 17:59:21 -08:00
ab9f925c04 Merge pull request #4581 from heyitsanthony/recipes-clientv3
contrib/recipes: use clientv3 APIs
2016-02-21 15:26:41 -08:00
3bb3351ca0 contrib/recipes: use clientv3 kv API 2016-02-21 14:43:41 -08:00
16420cfe63 contrib/recipes: use clientv3 lease API 2016-02-21 14:43:41 -08:00
936e991f9f contrib/recipes: use clientv3 watcher API 2016-02-21 00:10:58 -08:00
f8c998906c Merge pull request #4574 from heyitsanthony/clientv3-lease-ctx
clientv3: support context cancellation on lease keep alives
2016-02-20 23:53:16 -08:00
50ad181477 clientv3: support context cancellation on lease keep alives 2016-02-20 23:21:15 -08:00
5dfcdae521 Merge pull request #4579 from heyitsanthony/txn-fix-if-clobber
clientv3: copy correct pointers into txn comparisons
2016-02-20 22:18:56 -08:00
7b82576b60 clientv3: copy correct pointers into txn comparisons
Was copying the range variable's pointer; all elements of cmp were the same.
2016-02-20 22:01:33 -08:00
969c87b39f Merge pull request #4578 from gyuho/travis
travis: add go1.6
2016-02-20 21:17:02 -08:00
683c78f9a3 travis: add go1.6 2016-02-20 21:02:43 -08:00
25b4b8560b Merge pull request #4572 from xiang90/fix_tester_leak
tools/functional-tester: fix leaky issue by closing conn
2016-02-19 22:04:17 -08:00
e316678a4d tools/functional-tester: fix leaky issue by closing conn 2016-02-19 22:04:01 -08:00
c3824b10da Merge pull request #4570 from xiang90/ctl_mirror
etcdctlv3: add doc for mirror maker
2016-02-19 15:04:45 -08:00
13deb32447 etcdctlv3: add doc for mirror maker 2016-02-19 14:18:14 -08:00
582e089569 Merge pull request #4569 from gyuho/stress
etcd-tester: continue for deadline exceeded
2016-02-19 13:49:31 -08:00
8bcd823ed7 etcd-tester: continue for deadline exceeded 2016-02-19 13:48:58 -08:00
825003a723 Merge pull request #4565 from xiang90/ctl_watch
etcdctlv3: better watch command
2016-02-19 13:15:33 -08:00
755a022fcf etcdctlv3: better watch command 2016-02-19 11:42:37 -08:00
99eb6f35c1 Merge pull request #4533 from skarekrow/patch-1
Update freebsd.md
2016-02-18 21:04:46 -08:00
6764881256 doc: Update freebsd.md
Packages have changed names from coreosetcd to coreos-etcd
2016-02-18 23:03:19 -06:00
f607f64876 Merge pull request #4534 from mitake/obsolete-comment
etcdserver, auth: remove obsolete mutex
2016-02-18 17:56:29 -08:00
11bb07c248 Merge pull request #4564 from heyitsanthony/fix-watchreconnrequest
clientv3: fix current watcher reconnection
2016-02-18 15:14:22 -08:00
f66162932c clientv3: fix current watcher reconnection
If a current watcher didn't receive any events, a reconnect cycle would
advance its revision to the store's current revision. Instead, reconnect
using the watcher's creation header revision if the watcher hasn't received
any events.

Fixes #4502
2016-02-18 15:01:57 -08:00
a22bb7a4e8 Merge pull request #4562 from xiang90/get
etcdctlv3: better get command
2016-02-18 14:35:14 -08:00
9ea8ed3a13 etcdctlv3: better get command 2016-02-18 14:19:51 -08:00
d35a0d6533 Merge pull request #4563 from hongchaodeng/master
watch event docs
2016-02-18 13:59:25 -08:00
93795745b0 storage: add watch event docs for create case 2016-02-18 13:58:36 -08:00
c9d233d69a Merge pull request #4561 from heyitsanthony/gotip-shadow
integration: fix go vet -shadow error
2016-02-18 13:02:46 -08:00
55171707c3 Merge pull request #4560 from heyitsanthony/v3-delete-withfromkey
etcdserver: add >= support for v3 delete range
2016-02-18 13:02:03 -08:00
94fe87f010 Merge pull request #4557 from xiang90/del
etcdctlv3: make del command better
2016-02-18 12:47:58 -08:00
4fc89678b2 etcdserver: add >= support for v3 delete range 2016-02-18 12:34:04 -08:00
270fa00e54 integration: fix go vet -shadow error
breaking go tip
2016-02-18 12:26:35 -08:00
917d75f65e etcdctlv3: make del command better 2016-02-18 12:23:34 -08:00
71288597da Merge pull request #4556 from heyitsanthony/watcher-batch
storage: limit total unique revisions in unsynced watcher event list
2016-02-18 11:53:46 -08:00
7c17665a1a storage: limit total unique revisions in unsynced watcher event list 2016-02-18 11:36:22 -08:00
e6d6b648c4 Merge pull request #4553 from xiang90/raft_http
rafthttp: remove unncessary go routine
2016-02-18 10:47:45 -08:00
7d73d41758 Merge pull request #4555 from heyitsanthony/v3-with-prefix
clientv3: WithPrefix operation option
2016-02-18 10:37:24 -08:00
566fd02f62 Merge pull request #4558 from gyuho/path_fix
etcd-agent: get base when renaming
2016-02-18 09:15:27 -08:00
b2bad7bd79 etcd-agent: get base when renaming
Partially related to https://github.com/coreos/etcd/issues/4552.
2016-02-18 09:03:34 -08:00
2a3cacb60c rafthttp: remove unncessary go routine 2016-02-18 07:57:58 -08:00
4a041693de Merge pull request #4550 from xiang90/etcdctl_put
etcdctlv3: make PUT command clean and documented
2016-02-18 07:50:59 -08:00
535064924c etcdctlv3: make PUT command clean and documented 2016-02-18 07:46:05 -08:00
59291770d6 clientv3: WithPrefix operation option 2016-02-18 01:27:06 -08:00
cf71b64286 Merge pull request #4549 from gyuho/path
functional-tester: remove log prefixes
2016-02-17 19:05:07 -08:00
061e996998 functional-tester: remove log prefixes
capnslog already prefixes with its package name.
2016-02-17 19:01:16 -08:00
6bfd45a83e Merge pull request #4548 from gyuho/plog
functional-tester: plog for milli-second timestamp
2016-02-17 18:43:10 -08:00
7aa62ec595 functional-tester: plog for milli-second timestamp
Standard log package by default only prints out the second-scale
so the 3rd party log feeder mixes the order of the events, which makes
the debugging hard. This replaces it with capnslog and make them consistent
with all other etcd log formats.
2016-02-17 18:39:05 -08:00
40d3e0daff Merge pull request #4547 from gyuho/timeout
etcd-tester: 10-second timeout for stressers
2016-02-17 15:44:30 -08:00
239a6d89c5 etcd-tester: 10-second timeout for stressers
For https://github.com/coreos/etcd/issues/4477.
2016-02-17 15:44:05 -08:00
ef2d3feca6 Merge pull request #4528 from heyitsanthony/fix-watchcurrev
fix several watcher races
2016-02-17 14:26:33 -08:00
6b3fa6aa72 Merge pull request #4546 from xiang90/batch
rafthttp: smart batching
2016-02-17 14:05:11 -08:00
155412bbfa integration: overlapped create and put v3 watcher test 2016-02-17 14:03:52 -08:00
af225e7433 v3rpc: don't race on current watcher header revision 2016-02-17 14:03:52 -08:00
2cbf7cf6d1 storage: do not send outdated events to unsynced watchers 2016-02-17 14:03:51 -08:00
e4f22cd6d8 rafthtt: smart batching
Improved the overall performance more than 20% under heavyload
with little latency impact

heavy load
```
Requests/sec: ~23200  vs  Requests/sec: ~31500

Latency distribution:
  10% in 0.0883 secs.
  25% in 0.1022 secs.
  50% in 0.1207 secs.
  75% in 0.1460 secs.
  90% in 0.1647 secs.
  95% in 0.1783 secs.
  99% in 0.2223 secs.

vs

Latency distribution:
  10% in 0.1119 secs.
  25% in 0.1272 secs.
  50% in 0.1469 secs.
  75% in 0.1626 secs.
  90% in 0.1765 secs.
  95% in 0.1863 secs.
  99% in 0.2276 secs.
```

Similar on light load too.
2016-02-17 13:17:12 -08:00
a4105b5cce Merge pull request #4542 from xiang90/t
rafthttp: refactoring
2016-02-17 09:17:44 -08:00
59e7be4a2a v3api: send watch events only after sending watchid creation
If events show up before the watch id, the client won't be able
to match the event with the requested watcher.
2016-02-17 01:06:55 -08:00
019a145304 integration: put keys after watcher ack in TestV3WatchFromCurrentRevision
Watcher would miss events since the keys would be created after
sending the watcher request but before etcd registered the watcher.
2016-02-17 01:06:52 -08:00
74382f56fb rafthttp: handle short case in if statement 2016-02-16 19:26:51 -08:00
d393102e24 rafthttp: refactor 2016-02-16 19:21:53 -08:00
11d3e9ac69 rafthttp: better comment for streamWriter 2016-02-16 19:21:06 -08:00
56318f5433 rafthttp: add necessary locking 2016-02-16 19:18:05 -08:00
976236b083 Merge pull request #4539 from xiang90/snap
*: record the number of bytes of snapshot sent/received
2016-02-16 16:39:22 -08:00
994333e720 *: record the number of bytes of snapshot sent/received 2016-02-16 16:08:26 -08:00
31d2bb9739 Merge pull request #4537 from gyuho/mk
e2e: compare output in Go string literal
2016-02-16 16:01:20 -08:00
7cae2ae2a0 e2e: compare output in Go string literal
I manually print out the command outputs when the issue
was reproduced, and checked they are matching when compared as
Go string literals (UTF-8), but not when compared with regex.

Fixes https://github.com/coreos/etcd/issues/4480.
2016-02-16 13:57:49 -08:00
82c3600cfa Merge pull request #4536 from gyuho/tidy_cleanup
etcd-agent: tidy cleanup before SIGKILL
2016-02-16 13:25:40 -08:00
56e3ab0943 etcd-agent: tidy cleanup before SIGKILL
https://github.com/golang/go/blob/master/src/os/exec_posix.go#L18 shows that
cmd.Process.Kill calls syscall.SIGKILL to the command. But
http://tldp.org/LDP/Bash-Beginners-Guide/html/sect_12_01.html explains
'If you send a SIGKILL to a process, you remove any chance for the process to
do a tidy cleanup and shutdown, which might have unfortunate consequences.'
This sends SIGTERM, SIGINT syscalls to the PID so that the process could
have more time to clean up the resources.

Related to https://github.com/coreos/etcd/issues/4517.
2016-02-16 13:08:22 -08:00
613a1e7fdd Merge pull request #4535 from xiang90/mirrormaker
etcdctlv3: add initial mirrormaker
2016-02-16 11:34:11 -08:00
f71e733b8e etcdctlv3: add mirrormaker 2016-02-16 08:02:17 -08:00
6ba9373962 clientv3/sync: fix getting empty key error 2016-02-15 22:01:11 -08:00
30e4d7d6aa Merge pull request #4532 from xiang90/ctlv3
etcdctlv3: refactoring
2016-02-15 20:12:04 -08:00
0cf9cde1b4 etcdserver, auth: remove obsolete mutex
The removed mutex is obsolete because the enabled field is removed
in the commit b2d2c79a2f.
2016-02-16 12:03:31 +09:00
d78cff2c4a etcdctlv3: separate out cmd parsing logic for creating client 2016-02-15 14:44:25 -08:00
e75734aa68 Merge pull request #4529 from gyuho/govet
storage: shadowed err var fix at go-tip
2016-02-15 14:41:16 -08:00
03a7f2e10f etcdctlv3: secure is not HTTPS 2016-02-15 10:30:22 -08:00
fe5500228a Merge pull request #4523 from xiang90/syncer
*: move sync logic to clientv3/sync
2016-02-14 23:24:39 -08:00
24a6abaf59 *: move sync logic to clientv3/sync 2016-02-14 22:52:34 -08:00
449c116a44 storage: shadowed err var fix at go-tip 2016-02-14 19:49:03 -08:00
0662db11ed Merge pull request #4525 from heyitsanthony/fix-tls-proxy
e2e: tls proxy tests
2016-02-14 16:17:32 -08:00
1719bc0b0c e2e: tls proxy tests 2016-02-14 00:55:07 -08:00
925629dad8 Merge pull request #4521 from heyitsanthony/fix-doublebarrierfailover
contrib/recipes: fix revision race in double barrier
2016-02-13 15:26:25 -08:00
20ac633059 Merge pull request #4516 from gyuho/path
etcd-agent: ignore error when no file to rename
2016-02-13 14:24:17 -08:00
0f7f375043 contrib/recipes: fix revision race in double barrier
current kv revision might be ahead of ready put event; watch using key's mod
revision instead.

Fixes #4425
2016-02-13 11:57:12 -08:00
3d73e72bf6 Merge pull request #4049 from xiang90/raft_comment
raft: rework comment for advance interface
2016-02-13 10:22:18 -08:00
cff1208005 Merge pull request #4520 from xiang90/bench
tools/benchmark: support serializable range benchmark
2016-02-13 09:25:24 -08:00
4d0a2b4215 Merge pull request #4462 from mitake/benchmark-watch
tools/benchmark: revive watch benchmark
2016-02-13 09:24:00 -08:00
3a9a1c7d9b tools/benchmark: support serializable range benchmark 2016-02-13 09:23:35 -08:00
99e7449f44 tools/benchmark: revive watch benchmark
Current watch benchmark seems to be broken. This commit revives it.
2016-02-14 01:15:04 +09:00
205033d25f etcd-agent: ignore error when no file to rename
Fixes https://github.com/coreos/etcd/issues/4512.
When cluster fails before creation of log or data directory
the file does not exist and cannot be renamed. This skips such
error because there's no need to store empty logs in failure_archive.
2016-02-12 16:37:04 -08:00
c15b2a5077 Merge pull request #4514 from xiang90/tester
tools/functional-tester: add metrics
2016-02-12 14:45:10 -08:00
a30201c1d2 Merge pull request #4513 from gyuho/f1
etcd-tester: use Hash method to get both revision and hash
2016-02-12 14:44:52 -08:00
1bec0e6a0b tools/functional-tester: add metrics 2016-02-12 14:42:41 -08:00
5b2847b338 etcd-tester: use Hash method to get both revision and hash 2016-02-12 14:40:51 -08:00
93f2a4487a Merge pull request #4507 from gyuho/better_hash
etcdserver: populate ResponseHeader in Hash method
2016-02-12 14:36:26 -08:00
7a6f5695bb Merge pull request #4495 from heyitsanthony/snapshot-command
etcdctlv3: snapshot command
2016-02-12 14:34:47 -08:00
2710e4eed1 etcdserver: populate ResponseHeader in Hash method 2016-02-12 14:26:18 -08:00
91e2086d18 etcdctlv3: snapshot command
Partially addresses #4097
2016-02-12 14:19:26 -08:00
2415303991 clientv3: typedef <-chan WatchResponse to WatchChan 2016-02-12 14:06:59 -08:00
8e411b1b3b clientv3: send compacted revision before closing watch chan 2016-02-12 14:06:59 -08:00
ee1a03167d storage, v3: pass compaction revision through watchresponse 2016-02-12 14:06:59 -08:00
6851fffdfb clientv3: support >= Range requests
Turns out grpc will convert an empty byte string to nil, so use "\0" to
indicate Range on >= key in v3 grpc protocol.
2016-02-12 14:06:59 -08:00
390a4518c0 raft: rework comment for advance interface 2016-02-12 13:43:51 -08:00
1d02559ae5 Merge pull request #4501 from xiang90/lt
integration: test switch lease via put
2016-02-12 13:20:05 -08:00
11be967d12 Merge pull request #4510 from xiang90/client_support
clientv3: support serializable
2016-02-12 13:17:31 -08:00
5908e5b601 clientv3: support serializable 2016-02-12 12:24:46 -08:00
00f89941a9 Merge pull request #4508 from xiang90/l
*: support local range request
2016-02-12 12:14:53 -08:00
30c11c1bca *: support local range request 2016-02-12 12:04:06 -08:00
895b50029f Merge pull request #4506 from xiang90/w
refactoring watch integration test.
2016-02-12 10:11:20 -08:00
9cd45312d5 Merge pull request #4505 from heyitsanthony/v3-range-ge
storage: support ranges for >= key
2016-02-12 10:03:45 -08:00
be1534812a integration: give watch stream a timeout to fail the test fast 2016-02-12 09:58:15 -08:00
8ed9ebf3e1 integration: WaitResponse -> waitResponse 2016-02-12 09:50:29 -08:00
5f1d30b76a integration: move watch tests to v3_watch_test.go 2016-02-12 09:47:33 -08:00
4854d7f69d storage: support ranges for >= key
If end == "", range(key, end) will give all keys >= key.
2016-02-12 09:45:43 -08:00
392d3a1f26 Merge pull request #4497 from hongchaodeng/docs
kv.proto: docs of create_revision, mod_revision
2016-02-11 22:10:08 -08:00
fa45e13073 integration: test switch lease via put 2016-02-11 22:04:54 -08:00
a89ba1e583 Merge pull request #4499 from heyitsanthony/apply-scheduler
etcdserver: use fifo scheduler for applier
2016-02-11 21:44:34 -08:00
8c9dba8275 Merge pull request #4500 from gyuho/etcdctl_doc
etcdctl: make doc on 'no-sync' clearer
2016-02-11 21:40:31 -08:00
658f2e53ba etcdctl: make doc on 'no-sync' clearer
For issues like https://github.com/coreos/etcd/issues/4496.
2016-02-11 19:52:03 -08:00
616f395920 etcdserver: use fifo scheduler for applier 2016-02-11 19:21:30 -08:00
c27a4a1d3d kv.proto: docs of create_revision, mod_revision 2016-02-11 18:25:46 -08:00
89ca5ccccd Merge pull request #4486 from gyuho/f1
etcd-tester: failures aware of leader/non-leader
2016-02-11 17:08:09 -08:00
40c598cfa6 Merge pull request #4494 from xiang90/sched
*: fix schedule.Wait race
2016-02-11 15:56:01 -08:00
f34b8d334b Merge pull request #4408 from xiang90/revoke
detach keys from lease
2016-02-11 15:55:02 -08:00
c1851dfca1 etcd-tester: add leader failure cases 2016-02-11 15:46:44 -08:00
bfa5e310a9 *: detach keys from leases
1. deatch when a key is removed
2. deatch when the key's lease changes
3. potentially deatch when restroing a tombstone key
2016-02-11 15:31:25 -08:00
870e4c2681 *: fix schedule.Wait race 2016-02-11 15:21:33 -08:00
56cc2d00df Merge pull request #4493 from xiang90/fix_ctl
etcdctlv3: put should check error
2016-02-11 15:03:34 -08:00
b5d5bf625d etcdctlv3: put should check error 2016-02-11 15:01:33 -08:00
a78826e025 Merge pull request #4489 from heyitsanthony/fix-watcher-reqresend
clientv3: fix bad variable capture in watch request retry
2016-02-11 14:20:17 -08:00
e2146e2080 Merge pull request #4490 from gyuho/godoc
clientv3: fix godoc for member apis
2016-02-11 14:14:44 -08:00
b19d57e1c3 clientv3: fix godoc for member apis 2016-02-11 13:55:38 -08:00
4b9bf0d673 Merge pull request #4488 from gyuho/page_cache
etcd-agent: cleans page cache when cleaning up
2016-02-11 13:49:54 -08:00
78df258ea8 etcd-agent: cleans page cache when cleaning up
Reference:
- https://www.kernel.org/doc/Documentation/sysctl/vm.txt
- https://github.com/torvalds/linux/blob/master/fs/drop_caches.c
2016-02-11 13:48:54 -08:00
a8bfd1218b Merge pull request #4487 from gyuho/leader_api
etcdserver: include IsLeader in etcdserverpb.Member
2016-02-11 13:39:02 -08:00
3b7bd38a2d clientv3: fix bad variable capture in watch request retry
variables would be niled out when the goroutine runs, causing a crash
2016-02-11 13:35:07 -08:00
5fbf64c144 clienv3: add MemberAdd method based on v3 change 2016-02-11 13:27:34 -08:00
a56287b9b4 etcdserver: include IsLeader in etcdserverpb.Member 2016-02-11 13:04:03 -08:00
fcf94f3a59 Merge pull request #4483 from heyitsanthony/clientv3-op-opts
clientv3: optionize put and delete
2016-02-10 15:37:35 -08:00
3c9e8540a7 clientv3: optionize put and delete 2016-02-10 15:03:11 -08:00
21e5cf1555 Merge pull request #4447 from xiang90/sched
*: add a scheduler and use it to schedule compaction
2016-02-10 14:49:33 -08:00
d314345e6d *: add a scheduler and use it to schedule compaction 2016-02-10 14:27:08 -08:00
4b68977851 Merge pull request #4481 from xiang90/testing-tool
tools/etcd-dump-logs: support parsing v3 log
2016-02-10 14:00:32 -08:00
3611a9ad2e tools/etcd-dump-logs: support parsing v3 log 2016-02-10 13:52:02 -08:00
2a842eb049 Merge pull request #4476 from heyitsanthony/fix-raftexample-restart
contrib/raftexample: fix restart path
2016-02-10 13:00:57 -08:00
02b24c58fd contrib/raftexample: fix tests
os.Exit() on raft stop breaks out of the test fixture; instead,
monitor the error channel and exit on close
2016-02-10 11:49:13 -08:00
0cb304ec61 contrib/raftexample: fix restart path
The ConfChange fix crashes WAL replay because it assumed the node
always exists. Additionally, restart on a single node cluster
would deadlock because it had the wrong raft HardState.

To fix, replay path now goes through the regular raft loop.

Fixes #4474
2016-02-10 11:49:13 -08:00
d1b5fdea25 Merge pull request #4460 from mitake/build
build, test: don't pass -a flag for go build during ordinal building
2016-02-10 18:34:43 +01:00
5e47ba3fad Merge pull request #4473 from heyitsanthony/clientv3-ctx-kv
clientv3: ctx-ize KV
2016-02-09 17:54:14 -08:00
51c4894f62 clientv3: ctx-ize KV 2016-02-09 17:42:34 -08:00
3b8237d011 Merge pull request #4472 from gyuho/f0
etcd-tester: count success for v3
2016-02-09 17:16:06 -08:00
7b2ce70783 etcd-tester: count success for v3
needed for snapshot count comparison
2016-02-09 17:12:03 -08:00
bb504fdf23 Merge pull request #4470 from gyuho/test
etcd-tester: close leaky gRPC connections
2016-02-09 16:30:09 -08:00
7d2b7e0d23 etcd-tester: close leaky gRPC connections
when closed errors will be one of:

```
grpc.ErrorDesc(err) == context.Canceled.Error() ||
grpc.ErrorDesc(err) == context.DeadlineExceeded.Error() ||
grpc.ErrorDesc(err) == "transport is closing" ||
grpc.ErrorDesc(err) == "grpc: the client connection is closing"
```
2016-02-09 16:26:33 -08:00
a60da8f3ce Merge pull request #4471 from heyitsanthony/fix-integration-certs
integration: add key usage to server.crt
2016-02-09 15:52:14 -08:00
ed29bc3221 integration: add key usage to server.crt
curl handshake was failing; related: #209
2016-02-09 15:34:53 -08:00
3b0f8d53e5 Merge pull request #4468 from gyuho/f0
etcd-tester: continue after cleanup
2016-02-09 13:43:32 -08:00
16aa263d8c etcd-tester: continue after cleanup 2016-02-09 13:32:39 -08:00
c8994aab62 Merge pull request #4466 from gyuho/f0
etcd-tester: close gRPC connection when canceling
2016-02-09 11:07:00 -08:00
a46e20f92a etcd-tester: close gRPC connection when canceling
Currently gRPC connection just gets recreated
for every Stress call. When Stress ends or gets
canceled, gRPC connection must also be closed.

For https://github.com/coreos/etcd/issues/4464.
2016-02-09 11:04:16 -08:00
e9c0d5424a Merge pull request #4467 from gyuho/f00
etcd-tester: fix wrong error checking
2016-02-09 10:20:22 -08:00
e252c0c0ca etcd-tester: fix wrong error checking
Hash method returns either (nil, err) or (Hash, nil).
The current error checking is wrong. It only needs to check
the error is either nil or non-nil.

This causes panic in https://github.com/coreos/etcd/issues/4463
by allowing the case when resp is nil, but err is not nil.
2016-02-09 10:00:05 -08:00
fa71bec550 Merge pull request #4458 from xiang90/cl
etcd-tester: cleanup
2016-02-08 21:16:28 -08:00
e4d2ff3bd9 build, test: don't pass -a flag for go build during ordinal building
./build takes long time. On my Core i5 box, it requires almost 25
seconds. Without -a flag, it takes almost 15 seconds. Therefore this
commit reduces the flag in default. ./test activates -a via a new env
var GO_BUILD_FLAGS.
2016-02-09 14:11:44 +09:00
25834211a9 etcd-tester: cleanup 2016-02-08 20:54:25 -08:00
6d447f7a47 Merge pull request #4457 from gyuho/f0
etcd-agent: mkdir with read/write to all users
2016-02-08 20:51:30 -08:00
77f753ac32 etcd-agent: mkdir with read/write to all users 2016-02-08 20:50:11 -08:00
f368639f00 Merge pull request #4456 from mitake/v3-put-0-args
etcdctlv3: handle a case of 0 arguments in put command
2016-02-08 19:35:54 -08:00
674d54ad9b etcdctlv3: handle a case of 0 arguments in put command 2016-02-09 12:22:25 +09:00
15c8876e4c Merge pull request #4455 from heyitsanthony/etcdctlv3-compaction-err
etcdctlv3: report compaction error, if any
2016-02-08 19:15:40 -08:00
4698e7e23b Merge pull request #4450 from mitake/v3-put-stdin
etcdctlv3: support reading value from stdin
2016-02-08 19:13:56 -08:00
075f5f68ad etcdctlv3: support reading value from stdin
This commit lets etcdctlv3 support reading value from stdin like
`etcdctl set`. It is convenient for storing complex, long value.
2016-02-09 10:54:35 +09:00
aa9d3c8b74 etcdctlv3: report compaction error, if any 2016-02-08 15:09:03 -08:00
4d27abc296 Merge pull request #4454 from heyitsanthony/v3-cmp
clientv3: make compare compliant with proposed txn usage
2016-02-08 15:08:35 -08:00
f5fc0b8afa Merge pull request #4449 from gyuho/f0
etcd-tester: avoid directory name conflict
2016-02-08 14:37:47 -08:00
4f41d361a8 clientv3: make compare compliant with proposed txn usage 2016-02-08 13:48:29 -08:00
16543778f1 etcd-tester: avoid directory name conflict 2016-02-08 13:38:26 -08:00
70006da092 Merge pull request #4453 from heyitsanthony/v3-withoption
clientv3: withOption for Get
2016-02-08 13:31:24 -08:00
0fde354eba Merge pull request #4164 from cchamplin/ttl-refresh
store/httpapi: support refresh ttl without firing watch
2016-02-08 13:20:34 -08:00
b3c2891c1d Merge pull request #4452 from xiang90/too_large
*: limit request size for v3
2016-02-08 13:14:59 -08:00
8dcd24bd64 clientv3: withOption for Gets 2016-02-08 13:11:55 -08:00
35567221a7 *: limit request size for v3 2016-02-08 12:54:03 -08:00
2a261e1b03 Merge pull request #4438 from adamwg/raftexample-add-nodes
Reconfiguration for raftexample (#4018)
2016-02-08 12:46:07 -08:00
ffd61c0faf contrib/raftexample: Update README to reflect dynamic configuration changes (fixes #4018) 2016-02-08 13:03:25 -07:00
b72a0788ad Merge pull request #4439 from xiang90/fix_tr
transport: make tr stop safe
2016-02-08 11:25:03 -08:00
ea688c1f06 transport: make tr stop safe 2016-02-08 11:13:52 -08:00
4a527be302 contrib/raftexample: Allow nodes to be removed from a running cluster
A node with ID n can be removed by DELETEing /n on the HTTP server.
2016-02-08 10:51:06 -07:00
7c0b6d9be9 contrib/raftexample: Allow nodes to be added to a running cluster
A node with ID n can be added by POSTing the new node's URL to /n on the
HTTP server.
2016-02-08 10:51:06 -07:00
7d862960cc contrib/raftexample: Add a channel for proposing config changes
Add a channel over which we can propose cluster config changes to
raft. In an upcoming commit we'll add an HTTP endpoint that sends config
changes over this channel.
2016-02-08 10:51:06 -07:00
eb7fef559d contrib/raftexample: Handle conf change entries
So far we don't propose conf changes, but we'll be ready to handle them
when we do.
2016-02-08 10:51:03 -07:00
82778ed478 Add refresh parameter to allow TTL refreshes without firing watch/wait responses 2016-02-08 10:37:37 -07:00
2be7f7c2fb Merge pull request #4448 from gyuho/f0
etcd-tester: add compactKV every case, clean up logs
2016-02-06 15:14:41 -08:00
08dbabdb5f etcd-tester: add compactKV
It compacts storage for every case.
For https://github.com/coreos/etcd/issues/4380.
2016-02-06 15:02:58 -08:00
e7a5899582 Merge pull request #4410 from mitake/security-options
etcdctlv3: handle empty options related to security
2016-02-05 20:56:27 -08:00
6d12bf4951 Merge pull request #4445 from gyuho/func
etcd-tester: log cancel in stresser
2016-02-05 18:47:42 -08:00
044f7775e3 etcd-tester: log cancel in stresser
And fix some minor print formats.
2016-02-05 18:31:27 -08:00
38ab8aade4 Merge pull request #4437 from heyitsanthony/fix-testv2ctlwatch
e2e: fix race in etcdctl watch tests
2016-02-05 17:05:10 -08:00
c8fc3413b7 Merge pull request #4431 from heyitsanthony/rafthttp-localurl
rafthttp: plumb local peer URLs through transport
2016-02-05 16:56:32 -08:00
b7eb539b7a clientv3/integration: add delay after restart in TestTxnWriteFail
CI was timing out with publish failures
2016-02-05 16:44:41 -08:00
fe7cfe4d3d rafthttp: plumb local peer URLs through transport 2016-02-05 16:44:41 -08:00
a1ea70e36d Merge pull request #4442 from xiang90/rpc_err
*: add etcdserver namespace for rpc error
2016-02-05 15:22:35 -08:00
bc3fc4ea33 *: add etcdserver namespace for rpc error 2016-02-05 15:13:24 -08:00
6fc1fdc0b1 Merge pull request #4441 from philips/add-videos-to-v3-doc
Documentation: add more context on the v3 API
2016-02-05 14:45:16 -08:00
ef86b0972c Documentation: add more context on the v3 API
Add links to videos we have recorded and some context besides just the
bullet points.
2016-02-05 14:43:14 -08:00
054a4f3d7a Merge pull request #4433 from gyuho/f0
functional-tester/etcd-tester: silent grpclog, get revs, sleep longer
2016-02-05 14:09:07 -08:00
09fc764552 functional-tester/etcd-tester: silent grpclog, check revs 2016-02-05 14:04:58 -08:00
a31f9a8af1 contrib/raftexample: Publish only committed entries
We shouldn't publish entries to the kvstore until they've been
committed.
2016-02-05 13:59:49 -07:00
8f3e17b781 Merge pull request #4436 from heyitsanthony/txnfail-reconn
clientv3: retry remote connection on txn write failure
2016-02-05 12:52:16 -08:00
6bbb916b47 e2e: fix race in etcdctl watch tests 2016-02-05 12:27:05 -08:00
a6008f41e2 clientv3: retry remote connection on txn write failure 2016-02-05 11:51:44 -08:00
c6455ba2ce Merge pull request #4434 from xiang90/bolt
use initial map flag in bolt
2016-02-05 11:40:39 -08:00
26c645f049 storage/backend: set initial db size to avoid potential deadlock 2016-02-05 11:29:16 -08:00
8156a58ba3 godep: update bolt 2016-02-05 11:29:16 -08:00
7960bd8690 Merge pull request #4430 from heyitsanthony/clientv3-test-kv-retry
clientv3/integration: test Get retry
2016-02-05 11:17:54 -08:00
0385734111 clientv3/integration: KV retry tests
make sure Get will succeed after reconnect cycle with and without Put failure
2016-02-05 10:56:42 -08:00
d0977d59ce Merge pull request #4429 from heyitsanthony/fix-testtxnwritefail
clientv3/integration: add timeouts to TestTxnWriteFail
2016-02-05 10:37:44 -08:00
e3e4d09653 clientv3/integration: add timeouts to TestTxnWriteFail
so it doesn't take ten minutes to fail
2016-02-05 10:24:56 -08:00
c3fd2f95f0 Merge pull request #4432 from gyuho/f0
functional-tester/etcd-agent: configurable log path
2016-02-05 10:22:20 -08:00
e2b5b1cd1a functional-tester/etcd-agent: configurable log path 2016-02-05 09:37:24 -08:00
2cd1cc4111 Merge pull request #4428 from gyuho/f0
functional-tester/etcd-agent: configurable agent port
2016-02-04 21:40:33 -08:00
8e1325d9e1 functional-tester/etcd-agent: configurable agent port
To make local testing easier.
2016-02-04 21:10:30 -08:00
7abd21ee44 etcdctlv3: handle empty options related to security
Current etcdctlv3 doesn't handle options related to security when they
are not passed. Connections cannot be established and etcd produces
logs like this:

```
14:09:07 etcd1 | 2016/02/04 14:09:07 transport: http2Server.HandleStreams received bogus greeting from client: "\x16\x03\x01\x00\x9a\x01\x00\x00\x96\x03\x03\xf6\t\xda\x06QV\xb4\xdd\xc1gF\x1cC"
```

This commit fixes the problem. In addition, a case that empty strings
are passed to the options (e.g. --key="") are treated as error.
2016-02-05 13:31:37 +09:00
f003e421f5 Merge pull request #4426 from gyuho/f0
etcd-agent: fix data-dir path check
2016-02-04 18:34:16 -08:00
220d0c3c14 etcd-agent: fix data-dir path check
Need one more dash to match 'data-dir' flag.
2016-02-04 16:53:01 -08:00
2897fb0c5c Merge pull request #4412 from gyuho/f0
functional-tester: fix grpc endpoint, consistent check
2016-02-04 15:45:24 -08:00
85a4a5b596 functional-tester: fix grpc endpoint, consistent check
And some clean ups.
2016-02-04 15:10:43 -08:00
a60173f79c Merge pull request #4421 from jonboulle/master
docs: remove old client-matrix link
2016-02-04 14:15:26 -08:00
d21ef68a0c Merge pull request #4413 from gyuho/TestKVCompact
clientv3/integration: add TestKVCompact
2016-02-04 14:14:37 -08:00
60085a05f1 Merge pull request #4423 from xiang90/id
pkg/idutil: reduce conflict rate from 1% to 0.005%
2016-02-04 14:13:11 -08:00
5b4b1c7039 clientv3/integration: add TestKVCompact 2016-02-04 13:45:51 -08:00
e44e753e66 pkg/idutil: reduce conflict rate from 1% to 0.005%
Perviously, we only use 8bits from member identification
in id generation. The conflict rate is A(256,3)/256^3, which
is around 1%. Now we use 16bites to reduce the rate to 0.005%.

We can attach the full member id into id generation if needed...
2016-02-04 13:25:18 -08:00
f19ff0c39d docs: remove old client-matrix link
This document was removed in 23406dc2ee
but still linked to here.
2016-02-04 20:32:15 +01:00
0cdf1c45cf Merge pull request #4418 from xiang90/ci
pkg/wait: make id checking stricter
2016-02-04 10:50:11 -08:00
d43bd48977 pkg/wait: make id checking stricter
Do not allow register with same id.
2016-02-04 09:59:49 -08:00
a0b00a5b89 Merge pull request #4409 from heyitsanthony/v3-txn-tests
clientv3/integration: txn tests
2016-02-04 09:30:52 -08:00
9523c2d29f clientv3/integration: txn tests 2016-02-04 08:40:24 -08:00
26f440be7c Merge pull request #4406 from gyuho/f0
*: functional-tester with v3
2016-02-03 15:54:10 -08:00
c07fc3e08e Merge pull request #4407 from heyitsanthony/txn-no-retry
clientv3: don't retry txns that may modify the store
2016-02-03 15:37:49 -08:00
16dcade07d Merge pull request #4402 from heyitsanthony/minority-failure
rafthttp: add leader to transport if peer does not exist
2016-02-03 15:31:01 -08:00
7a3426a231 tools/functional-tester/etcd-tester: support v3 kv storage 2016-02-03 15:24:54 -08:00
b6a08a97e2 clientv3: don't retry txns that may modify the store 2016-02-03 14:55:16 -08:00
ed682c9f08 tools/functional-tester: minor cleanup 2016-02-03 14:30:34 -08:00
db0b505de5 rafthttp: add requester to transport if peer does not exist
cluster integration now supports adding members with stopped nodes, too

Fixes #3699
2016-02-03 14:16:46 -08:00
6da3c08b4d Merge pull request #4401 from QuentinPerez/format
wrong number of args for format in Errorf/Printf
2016-02-03 13:21:33 -08:00
86aafcd15a clientv3/integration: fix args format in Errorf/Printf 2016-02-03 22:17:58 +01:00
f82a1b0583 Merge pull request #4404 from gyuho/hash
etcdserver: add kv Hash method (for testing purpose)
2016-02-03 13:14:13 -08:00
2d197ac9e8 *: add kv Hash method (for testing purpose) 2016-02-03 12:52:39 -08:00
c98f6fa1b9 Merge pull request #4399 from mitake/genproto
scripts: get goimports in genproto.sh
2016-02-03 21:27:28 +01:00
9bfe617728 Merge pull request #4400 from mitake/v3-member-tirivial
etcdserver: update comments in a generated file
2016-02-03 09:01:40 -08:00
f7d35a1ef8 etcdserver: update comments in a generated file
On the latest master branch, etcdserver/etcdserverpb/etcdserver.pb.go
is changed when scripts/genproto.sh is executed. The content only has
changes for comment. Therefore it is not important but the change is
annoying when we update the proto file.
2016-02-04 00:45:16 +09:00
be9c9ae7d5 scripts: get goimports in genproto.sh
genproto.sh uses goimports. Getting the command in the script is
useful.
2016-02-04 00:37:50 +09:00
456975f631 Merge pull request #4396 from xiang90/fix_watch
storage: update watch.cur and fix tests
2016-02-03 01:14:29 -08:00
b09214df32 storage: update watch.cur and fix tests 2016-02-03 00:54:07 -08:00
6c7ff98b0d Merge pull request #4395 from xiang90/fix_watch
fix watch issues in storage
2016-02-03 00:10:22 -08:00
3ed404633a v3rpc: add compacted field from wresp 2016-02-02 23:24:15 -08:00
52416fafb0 storage: send compaction 2016-02-02 23:17:27 -08:00
5780497e18 storage: remove unncessary handle func 2016-02-02 23:02:15 -08:00
ae5161382b storage: release tx lock until finish using the readonly bytes
The backend will return read only bytes that are only vaild while
the tx is open. We should hold the lock until we get a full copy
by unmarshal.
2016-02-02 22:49:05 -08:00
d7f6ad0334 Merge pull request #4392 from xiang90/watch
storage: make unsync a watcherSetByKey
2016-02-02 21:36:30 -08:00
7277b1fe15 Merge pull request #4394 from heyitsanthony/coalesce-stacks
pkg/testutil: more aggressive goroutine stack trace coalescing
2016-02-02 21:30:53 -08:00
72b31d6fdc pkg/testutil: more aggressive goroutine stack trace coalescing
Strips out the pointer arguments in the header of the stack trace so
that more stack traces match each other.
2016-02-02 21:20:24 -08:00
cb30d6e6f8 Merge pull request #4393 from xiang90/fix_test
clientv3/integration: fix member remove
2016-02-02 21:10:40 -08:00
c7876d4111 clientv3/integration: fix member remove
Do not connect to the member to remove.
2016-02-02 20:49:00 -08:00
31c0c5181a storage: make unsync a watcherSetByKey 2016-02-02 20:09:35 -08:00
622bc0d8b7 Merge pull request #4390 from xiang90/watch
storage: remove unnecessary abstraction
2016-02-02 20:06:22 -08:00
8dc6248aa7 storage: add set delete 2016-02-02 19:28:42 -08:00
810c3e74a8 storage: remove unnecessary abstraction 2016-02-02 19:15:46 -08:00
79c07024fb Merge pull request #4389 from xiang90/watch
storage: add watchSet and watchSetByKey type
2016-02-02 19:13:04 -08:00
e5b35b82c5 storage: add watchSet and watchSetByKey type 2016-02-02 18:56:36 -08:00
2919be91b9 Merge pull request #4387 from heyitsanthony/integration-cluster-speedup
integration: decrease timeout for isMemberBootstrapped
2016-02-02 14:45:18 -08:00
9ae8d85049 integration: decrease timeout for isMemberBootstrapped
Spending seconds(!) when it would fail anyway.

integration/TestV3 (before): 100.670
integration/TestV3 (after): 29.571
2016-02-02 14:34:58 -08:00
83411b92b4 Merge pull request #4386 from xiang90/promote
integration: add test promote and move lease tests to lease_test.go
2016-02-02 14:06:48 -08:00
6f72b31316 integration: add test promote and move lease tests to lease_test.go 2016-02-02 13:45:11 -08:00
71e000de65 Merge pull request #4376 from heyitsanthony/txn-no-duplicate-putkey
etcdserver: reject v3 txns with duplicate put keys
2016-02-02 13:17:47 -08:00
c5c5063efe etcdserver: reject v3 txns with duplicate put keys
An API check to support PR #4363; bad requests didn't return an error.
2016-02-02 12:32:33 -08:00
20673e384a Merge pull request #4382 from xiang90/lease_keep_test
clientv3/integration: test lease keepalive
2016-02-02 12:01:22 -08:00
5f9f56ca17 Merge pull request #4383 from xiang90/client_no_end
clientv3: add no endpoint error
2016-02-02 12:00:44 -08:00
7a91108b91 clientv3: add no endpoint error 2016-02-02 11:01:58 -08:00
fd0e68d16b clientv3/integration: test lease keepalive 2016-02-02 10:59:22 -08:00
5f20aaa457 Merge pull request #4360 from heyitsanthony/v3-client-watcher
V3 client watcher
2016-02-01 23:39:30 -08:00
87ed04ea6f Merge pull request #4372 from gyuho/kv_delete
clientv3/integration: add TestKVDelete*
2016-02-01 23:30:11 -08:00
580c563ed6 clientv3: watcher implementation 2016-02-01 23:21:55 -08:00
826df1787a Merge pull request #4373 from heyitsanthony/clientv3-unix-endpoints
clientv3: support unix endpoints
2016-02-01 22:49:49 -08:00
3f29e730eb Merge pull request #4374 from xiang90/member_api
clientv3: implement cluster api
2016-02-01 22:49:43 -08:00
9afee37471 Merge pull request #4375 from xiang90/kvapi_no_retry
clientv3: do not retry on modifications
2016-02-01 22:49:18 -08:00
b74a42b286 clientv3: support unix endpoints 2016-02-01 22:24:42 -08:00
eb8ab3ace4 clientv3: synchronous lease Close 2016-02-01 22:24:42 -08:00
a9bd30b4af clientv3: do not retry on modifications 2016-02-01 21:54:40 -08:00
a25423ca99 clientv3: implement cluster api 2016-02-01 21:46:23 -08:00
e49ae8b03f clientv3/integration: add TestKVDelete* 2016-02-01 17:07:52 -08:00
24f5640d83 Merge pull request #4371 from gyuho/govet_lease
clientv3: fix shadowed variables in lease
2016-02-01 15:17:38 -08:00
d4438fbb97 Merge pull request #4370 from gyuho/kv_lease
clientv3/integration: TestKVPut with lease id
2016-02-01 15:15:01 -08:00
f7692cf5d2 clientv3: fix shadowed variables in lease 2016-02-01 15:08:24 -08:00
7d278ef6bb clientv3/integration: TestKVPut with lease id 2016-02-01 14:46:20 -08:00
0707261397 Merge pull request #4361 from gyuho/e2e
e2e: check regexp.MatchReader return, curl SSL issue
2016-02-01 13:23:32 -08:00
438fecebcd Merge pull request #4367 from xiang90/lease_test
Lease test
2016-02-01 13:20:25 -08:00
a8e72b6285 proxy: prints out when endpoints are found 2016-02-01 13:08:21 -08:00
bef7887c0d clientv3/integration: add basic lease test 2016-02-01 12:59:44 -08:00
2e051c1c61 e2e: check regexp.MatchReader return, curl SSL issue
1. proc.ExpectRegex returns the result of regexp.MatchReader,
which does not return error even if there is no match of regex.
This fixes it by checking the boolean value and if the boolean
value is false, it returns error.

2. Adds more tests and finishes coreos#4259.

3. When we do the regex match correctly, curl request through SSL
returns error. For the purpose of debugging, I changed it to log
without failing the tests. etcdctl with SSL works fine.

4. Add // TODO: 'watch' sometimes times out in Semaphore CI environment but
works fine in every other environments.

5. increase test time
2016-02-01 12:20:53 -08:00
8431801814 lease: fix lease init race 2016-02-01 12:07:34 -08:00
64766e9d6a Merge pull request #4366 from heyitsanthony/fix-rejectinsecure
integration: accept transient failure in TestGRPCRejectInsecureClient
2016-02-01 11:58:41 -08:00
7d8ae8af78 integration: accept transient failure state in TestGRPCRejectInsecureClient 2016-02-01 11:32:22 -08:00
220fba32a3 Merge pull request #4353 from xiang90/lease
clientv3: initial lease
2016-02-01 10:40:44 -08:00
f2c24dec05 clientv3: initial lease 2016-02-01 09:55:32 -08:00
595cfd9a11 Merge pull request #4364 from heyitsanthony/goword-typos
*: fix many typos
2016-01-31 22:24:43 -08:00
20461ab11a *: fix many typos 2016-01-31 21:42:39 -08:00
4ba1ec6a4d Merge pull request #4363 from xiang90/watch
storage: simplify watch store
2016-01-31 16:49:44 -08:00
611751aee2 storage: simplify watch store
We decided that we will not support modifing the same
key in one txn multiple times. That can simlify the current
code/design a lot.
2016-01-31 16:33:07 -08:00
1ab99206bc Merge pull request #4362 from gyuho/leaky_raft
raft: fix leaky goroutines in raft test
2016-01-31 12:55:54 -08:00
c827c7432c raft: fix leaky goroutines in raft test 2016-01-31 12:41:33 -08:00
cb7ebe81a8 Merge pull request #4359 from joshix/kayrus-deprcpeers
etcdctl: Changed default "endpoint" servers order
2016-01-30 21:20:10 -08:00
1b7b20f4f8 etcdctl: Changed default "endpoint" servers order
And added "DEPRECATED" note to a "peers" option (relates to d685135832)

Updated to amend commit.
2016-01-30 21:07:57 -08:00
42acd957a3 Merge pull request #4358 from gyuho/typo
*: fix minor typos, comments
2016-01-30 18:29:03 -08:00
9f7907cd30 Merge pull request #4357 from gyuho/govet
store: fix to 'go vet' errors
2016-01-30 18:24:19 -08:00
71c2a9bb3c *: fix minor typos, comments 2016-01-30 18:15:56 -08:00
8644ddea20 Merge pull request #4356 from gyuho/range
*: kv range to return current revision
2016-01-30 17:56:48 -08:00
445ff6970b store: fix to 'go vet' errors
This check has been introduced to go tip [1]. Just fixes by calling
the Index method.

[1]: 0f89efa255
2016-01-30 17:53:58 -08:00
f6215574f2 *: kv range to return current revision
This changes the behavior of KV's range and tx range to return
current revision rather than range revision. This makes populating
range response header easier.
2016-01-30 17:37:34 -08:00
6577df17d6 Merge pull request #4354 from coreos/revert-4348-clientv3_integration_test
Revert "*: TestKVRange to clientv3/integration, fix rev"
2016-01-29 18:21:13 -08:00
57dedd8c89 Revert "*: TestKVRange to clientv3/integration, fix rev" 2016-01-29 18:20:56 -08:00
f6031b9d11 Merge pull request #4349 from heyitsanthony/v3-client-conntls
V3 client TLS
2016-01-29 17:05:38 -08:00
ca9bd575b1 integration: v3 grpc tls tests 2016-01-29 16:38:52 -08:00
60c037f1c3 integration: add client tls support 2016-01-29 16:38:11 -08:00
4634874d99 etcdmain, integration, v3rpc: consolidate grpc server setup 2016-01-29 16:38:11 -08:00
563850bcc1 etcdmain: support v3 tls 2016-01-29 16:38:11 -08:00
4380617e1a tools/benchmark: support tls 2016-01-29 16:38:11 -08:00
781bf625af etcdctlv3: support tls
Fixes #4299
2016-01-29 16:37:59 -08:00
f37c896159 clientv3: support tls 2016-01-29 16:37:18 -08:00
f549dabe7f Merge pull request #4351 from gyuho/e2e
e2e: add more etcdctl tests
2016-01-29 16:33:40 -08:00
cf80035d72 Merge pull request #4348 from gyuho/clientv3_integration_test
*: TestKVRange to clientv3/integration, fix rev
2016-01-29 16:25:15 -08:00
bde5f7f20f e2e: add more etcdctl tests 2016-01-29 16:20:12 -08:00
69abdf8144 *: TestKVRange to clientv3/integration, fix rev
For https://github.com/coreos/etcd/issues/4338.
And resp.Header.Revision should be from the one in storage
when we just do range, because there is no storage data
change.
2016-01-29 16:12:21 -08:00
e5527914aa Merge pull request #4346 from gyuho/watch_timeout
e2e: increase watch timeout
2016-01-29 09:01:33 -08:00
73b71dc9a7 Merge pull request #4344 from shawnps/patch-3
raft: fix var name in comment
2016-01-28 23:27:20 -08:00
713b56c860 e2e: increase watch timeout 2016-01-28 23:24:04 -08:00
edd823bba6 raft: fix var name in comment 2016-01-29 16:18:47 +09:00
754967644d Merge pull request #4339 from shawnps/patch-2
clientv3/integration: add tt.key string formatting verb to Fatalf call
2016-01-28 23:09:06 -08:00
eebb31b0e7 Merge pull request #4340 from shawnps/patch-3
integration: add want IDs to Errorf calls
2016-01-28 23:04:34 -08:00
12556af26b Merge pull request #4343 from shawnps/patch-6
testutil: fix typo in comment
2016-01-28 23:03:29 -08:00
6e35f61b65 Merge pull request #4342 from shawnps/patch-5
logutil: fix typo in comment
2016-01-28 23:02:38 -08:00
0cbd5bfe7d testutil: fix typo in comment 2016-01-29 16:02:07 +09:00
6c451a44ca Merge pull request #4341 from shawnps/patch-4
rafthttp: fix typo in test comment
2016-01-28 23:01:58 -08:00
4e649970df logutil: fix typo in comment 2016-01-29 16:01:19 +09:00
96d82b40fb rafthttp: fix typo in test comment 2016-01-29 15:59:36 +09:00
7abbd31fcd integration: add want IDs to Errorf calls 2016-01-29 15:57:02 +09:00
1ac0ca5317 clientv3/integration: add tt.key string formatting verb to Fatalf call 2016-01-29 15:56:04 +09:00
e7f50d8444 Merge pull request #4336 from gyuho/clientv3_test
integration: V3 grpc with clientv3 (only Put)
2016-01-28 21:33:54 -08:00
1767788074 *: expose integration functions for clientv3 2016-01-28 21:21:34 -08:00
9a441c2c1a Merge pull request #4334 from heyitsanthony/fix-4333
etcdserver/test: use recorderstream in TestApplyRepeat
2016-01-28 18:20:00 -08:00
082a6c304e etcdserver/test: use recorderstream in TestApplyRepeat
was racing when waiting for the node commit

fixes #4333
2016-01-28 17:19:06 -08:00
127d717c0a Merge pull request #4335 from xiang90/ts
clientv3: threadsafe
2016-01-28 17:04:43 -08:00
85bfbfa5ad clientv3: threadsafe 2016-01-28 16:41:09 -08:00
26028ce44b Merge pull request #4331 from xiang90/c_t
clientv3: hook up KV and Txn
2016-01-28 16:30:41 -08:00
7b8cd274c9 Merge pull request #4330 from gyuho/proxy_e2e
e2e: etcdctl test for proxy no-sync
2016-01-28 16:13:44 -08:00
6491bae27f e2e: etcdctl test for proxy no-sync
For https://github.com/coreos/etcd/issues/3894.
2016-01-28 16:01:29 -08:00
eb03d48034 clientv3: hook up KV and Txn 2016-01-28 15:15:21 -08:00
aef77f9829 Merge pull request #4329 from xiang90/client_txn
clientv3: initial txn
2016-01-28 14:49:26 -08:00
92653dcbfb clientv3: initial txn 2016-01-28 14:27:42 -08:00
9842a3b84a Merge pull request #4328 from heyitsanthony/v3-client-conntimeout
clientv3: support connection timeout
2016-01-28 13:40:44 -08:00
64413927cc clientv3: support connection timeout 2016-01-28 13:25:45 -08:00
7cc02bc143 clientv3: fix vet warnings 2016-01-28 13:25:10 -08:00
993dc2b7db Merge pull request #4327 from gyuho/minor_govet
integration: fix shadowed variables (based govet)
2016-01-28 11:55:08 -08:00
599ecebdec integration: fix shadowed variables based on govet 2016-01-28 11:42:58 -08:00
5bd930c9d2 Merge pull request #4325 from xiang90/client_lease
clientv3: lease initial api
2016-01-28 10:49:50 -08:00
937eeafc3a Merge pull request #4323 from xiang90/client_watch
clientv3: initial watch API
2016-01-28 10:49:33 -08:00
6d41ccbe59 Merge pull request #4316 from xiang90/client_ops
clientv3: fill in kv ops
2016-01-28 10:49:18 -08:00
026c2e7a7c clientv3: lease initial api 2016-01-28 09:20:29 -08:00
c5b9433496 Merge pull request #4324 from jonboulle/master
client: fix typo in docstring (than -> then)
2016-01-28 18:09:42 +01:00
5490db7cca client: fix typo in docstring (than -> then) 2016-01-28 18:07:39 +01:00
205ffa5cb6 clientv3: initial watch API 2016-01-28 08:54:32 -08:00
2d2f14385d clientv3: fill in kv ops 2016-01-28 08:17:53 -08:00
7adb8442ba Merge pull request #4320 from sublee/fix-typo
documentation: fix typo "a etcd" -> "an etcd"
2016-01-28 13:36:15 +01:00
c4a0159601 documentation: fix typo "a etcd" -> "an etcd"
"a" is not a correct article for "etcd".
2016-01-28 19:19:32 +09:00
145d5b85a7 Merge pull request #4314 from xiang90/clientv3_impl
clientv3: use retryConnection
2016-01-27 22:52:31 -08:00
a3b7876a3c clientv3: use retryConnection 2016-01-27 22:31:15 -08:00
3df91f85c4 Merge pull request #4312 from heyitsanthony/v3-client-connretry
clientv3: connection retry and customizable endpoint selection
2016-01-27 20:57:38 -08:00
c338a47751 Merge pull request #4313 from xiang90/clientv3
etcdclientv3: setup initial structure
2016-01-27 20:50:40 -08:00
dba92346f3 etcdclientv3: setup initial structure 2016-01-27 20:36:36 -08:00
2db2f381fb clientv3: connection retry and customizable endpoint selection 2016-01-27 19:27:31 -08:00
ddaf023b9c Merge pull request #4295 from heyitsanthony/v3-recipes-leases
recipes: add election and double barrier recipes
2016-01-27 19:12:18 -08:00
6f0cc54541 contrib/recipes: add election and double barrier recipes
these recipes rely on leases so they weren't included in the last batch
2016-01-27 15:44:51 -08:00
163812246f Merge pull request #4306 from heyitsanthony/v3-client
replace raw v3 grpc connections with clientv3.Client
2016-01-27 14:52:40 -08:00
56fce9f386 contrib/recipes, integration: use clientv3
updating both together since there's a circular dependency
2016-01-27 14:37:51 -08:00
36d9942de1 Merge pull request #3536 from xiang90/client
clientv3: add initial kv api for client
2016-01-27 14:35:20 -08:00
e4dab0f40d clientv3: add initial kv api for client 2016-01-27 14:05:55 -08:00
a7b6bbff3f tools/benchmark: use clientv3 2016-01-27 12:13:17 -08:00
9a5a3ebc79 etcdctlv3: consolidate dial code; use clientv3 2016-01-27 12:13:17 -08:00
5ccf7f5151 clientv3: small client wrapper
mostly to standardize etcd grpc dials
2016-01-27 12:13:12 -08:00
0020c63dec Merge pull request #4301 from gyuho/no_pipe
etcdctl: use os.Stdout, os.Stderr directly for cmd
2016-01-27 11:10:34 -08:00
37290820de Merge pull request #4293 from bdarnell/bcast-after-commit
raft: Always call bcastAppend after maybeCommit
2016-01-27 09:58:22 -08:00
0f3b9c21b6 Merge pull request #4302 from srijs/patch-2
raft/doc: add notice about thread safety of messages
2016-01-27 09:41:50 -08:00
28585ddafa Merge pull request #4300 from gyuho/grpc_dial
*: pass WithInsecure to grpc.Dial for now
2016-01-27 09:13:36 -08:00
be21d90108 raft/doc: add notice about thread safety of messages
Fixes #4285
2016-01-27 20:18:19 +11:00
c8eebd0070 etcdctl: use os.Stdout, os.Stderr directly for cmd 2016-01-27 00:54:40 -08:00
fa21946267 *: pass WithInsecure to grpc.Dial for now
Related to https://github.com/coreos/etcd/issues/4299.
2016-01-27 00:24:03 -08:00
14255854d8 Merge pull request #4298 from heyitsanthony/fix-testapplysnapshot-race
etcdserver/test: synchronously wait on TestApplySnapshotAndCommittedE…
2016-01-26 21:18:09 -08:00
64596f0c49 etcdserver/test: synchronously wait on TestApplySnapshotAndCommittedEntries
Replaces the RecorderBuffered with a RecorderStream so Wait will block
waiting for updates to the etcdserver store.

Fixes #4296
2016-01-26 21:03:03 -08:00
6054748181 Merge pull request #4297 from ngaut/ngaut/raft-typo
raft: typo
2016-01-26 20:48:53 -08:00
751ab40f44 raft: typo 2016-01-27 12:35:14 +08:00
6d8b82f6ce Merge pull request #4294 from xiang90/member_api
*: finish member api implementation
2016-01-26 18:48:16 -08:00
36cc8446c7 *: finish member api implementation 2016-01-26 18:09:14 -08:00
02628298f4 Merge pull request #4292 from gyuho/gRPC_update
*: gRPC update
2016-01-26 17:55:48 -08:00
a35d5889f6 *: update gRPC, proto interface 2016-01-26 17:41:39 -08:00
a0a142f3e7 contrib/recipes: update gRPC, proto interface 2016-01-26 17:41:35 -08:00
652c01bffe tools/benchmark: update gRPC, proto interface 2016-01-26 17:41:32 -08:00
51e62aa007 integration: update gRPC, proto interface 2016-01-26 17:41:27 -08:00
1145414a08 etcdctlv3: update gRPC, proto interface 2016-01-26 17:41:23 -08:00
ad15bdcb07 etcdserver: update gRPC, proto interface 2016-01-26 17:41:19 -08:00
1c4c45cc7a scripts: update genproto for import issue 2016-01-26 17:41:12 -08:00
4bb0481115 Godeps: update gRPC w/ related packages 2016-01-26 17:41:08 -08:00
4c024b305f Merge pull request #4290 from heyitsanthony/fix-apply-noents
etcdserver: don't try to apply empty message list
2016-01-26 14:11:54 -08:00
04cece8f18 Merge pull request #4291 from bdarnell/remove-commit
raft: Remove redundant `raft.Commit` field.
2016-01-26 14:09:41 -08:00
0771d713e6 raft: Always call bcastAppend after maybeCommit 2016-01-26 16:55:47 -05:00
22925a1d2f raft: Remove redundant raft.Commit field.
Keeping this field in sync with `raft.raftLog.committed` was
error-prone, so instead we synthesize the `HardState` on demand.

Fixes #4278.
2016-01-26 15:18:55 -05:00
bd02d668c8 etcdserver: don't try to apply empty message list
If all messages have been applied, don't apply an empty messages list;
otherwise appliedi will update to 0 and etcd will panic.

Fixes #4278
2016-01-26 11:56:37 -08:00
179a8f9768 Merge pull request #4289 from xiang90/member_api
etcdserver: initial member api proto
2016-01-26 10:16:44 -08:00
864fc197c1 etcdserver: initial member api proto 2016-01-26 09:56:50 -08:00
59c6735c3c Merge pull request #4282 from xiang90/range_invalid
etcdserver: check invalid range in txn
2016-01-25 22:23:55 -08:00
dd1bbaa293 Merge pull request #4281 from mitake/remove-cached-auth-flag
etcdserver, auth: not cache a flag of auth status
2016-01-25 21:55:44 -08:00
a56387bc3e Merge pull request #4284 from xiang90/max_txn
v3rpc: check max ops in txn
2016-01-25 21:16:01 -08:00
c8bf77c722 v3rpc: check max ops in txn 2016-01-25 21:04:19 -08:00
128b5e7387 etcdserver: check invalid range in txn 2016-01-25 20:21:17 -08:00
2a9cccd659 Merge pull request #4283 from heyitsanthony/fix-stopdelay-leak
etcdserver: complete stopWithDelay on server shutdown
2016-01-25 20:11:55 -08:00
12f6b8e72d etcdserver: complete stopWithDelay on server shutdown
Was causing goroutine leaks on my machine.
2016-01-25 19:45:29 -08:00
b2d2c79a2f etcdserver, auth: not cache a flag of auth status
This commit removes a flag that indicates auth is enabled or disabled
because it doesn't have an invalidation mechanism.

Fixes https://github.com/coreos/etcd/issues/3601 and https://github.com/coreos/etcd/issues/3964
2016-01-26 11:46:25 +09:00
8199147cf8 Merge pull request #4246 from bdarnell/commit-after-remove-node
raft: Call maybeCommit after removing a node
2016-01-25 11:47:56 +08:00
b1a45fe1c8 Merge pull request #4274 from xiang90/leasehttp
leasehttp: move lease/http.go to its own pkg
2016-01-25 10:25:25 +08:00
8c9b8e91f2 Merge pull request #4275 from xiang90/fix_lease_restore
lease: fix restore
2016-01-25 10:21:09 +08:00
77cf05364d Merge pull request #4261 from gyuho/racey_e2e
*: detect leaky goroutines, fix leaks
2016-01-24 18:17:49 -08:00
e925a359a2 lease: fix restore 2016-01-25 10:06:14 +08:00
ef6320e638 etcdserver: make cluster checking interval shorter 2016-01-25 08:16:05 +08:00
1aa312fcce *: lease forwarding should resue transport 2016-01-25 06:56:07 +08:00
72ffa74476 pkg/transport: update timeout transport to reuse conn when timeout is not set 2016-01-25 06:55:54 +08:00
5e2dbadbc0 leasehttp: move lease/http.go to its own pkg 2016-01-25 06:09:54 +08:00
8905632837 Merge pull request #4272 from srijs/patch-2
raft: use configured logger in raft/node.go
2016-01-24 23:24:30 +08:00
4dd1718bde Merge pull request #4273 from srijs/patch-3
readme: link to raft.github.io instead of raftconsensus.github.io
2016-01-24 15:05:54 +01:00
02a34abf85 readme: link to raft.github.io instead of the old raftconsensus.github.io 2016-01-25 00:28:18 +11:00
896719c877 raft: use configured logger in raft/node.go
Those three log statements in node.go have not been using the logger that was passed via `raft.Config`, but instead the default raft logger. This changes it to use the proper logger.
2016-01-25 00:15:44 +11:00
96d2ee20e3 *: detect leaky goroutines, fix leaks
gexpect.Interact leaks. This adds ReadLine method to wait for the leaky
goroutine to accept an EOF.

Fixes https://github.com/coreos/etcd/issues/4258.

Reference: https://github.com/coreos/etcd/pull/4261#issuecomment-174198945.
2016-01-23 13:52:41 -08:00
5099bf6f7a Merge pull request #4269 from heyitsanthony/v3-reject-put-bogus-lease
etcdserver: return error when putting a key with a bad lease id
2016-01-22 21:25:17 -08:00
9572197aee etcdserver: return error when putting a key with a bad lease id 2016-01-22 20:47:31 -08:00
5a9f81b198 Merge pull request #4267 from gyuho/govet
pkg/testutil: fix unreachable govet complain
2016-01-22 15:33:55 -08:00
0d646a25ee pkg/testutil: fix unreachable go tool vet complain 2016-01-22 15:16:35 -08:00
c21d87354e Merge pull request #4266 from gyuho/minor_govet
integration: minor govet shadow fix
2016-01-22 14:43:13 -08:00
5e4113374b integration: minor govet shadow fix 2016-01-22 14:40:21 -08:00
53d6aede82 Merge pull request #3889 from gyuho/raft_doc.go_20151118
raft: doc, debugging instruction on MessageType
2016-01-22 14:22:49 -08:00
6c240b4037 Merge pull request #4262 from heyitsanthony/v3-lease-watch-expire
storage: publish delete events on lease revocation
2016-01-22 14:09:19 -08:00
ae05c87c2f Merge pull request #4238 from heyitsanthony/v3-recipes
contrib: v3 recipes
2016-01-22 14:08:21 -08:00
b07900ae03 contrib: v3 recipes
Concurrency recipes using the V3 API (sans leases).
2016-01-22 13:46:22 -08:00
5a967eb2a0 storage: publish delete events on lease revocation 2016-01-22 13:40:55 -08:00
53def2dc5e Merge pull request #4260 from heyitsanthony/v3-lease-forward-keepalive
lease: forward lease keepalive to leader
2016-01-22 12:53:17 -08:00
2e157530a0 etcdhttp, lease, v3api: forward keepalives to leader
keepalives don't go through raft so let follower peers announce
keepalives to the leader through the peer http handler
2016-01-22 12:40:40 -08:00
be7d573366 lease: store server-decided TTL in lease
If actual TTL is not stored in lease, the client will receive the correct
TTL and therefore won't be able to keepalive correctly.
2016-01-22 11:44:12 -08:00
2b54c5a977 Merge pull request #4253 from heyitsanthony/v3-lease-grant-consistency
lease: grant consistent lease IDs
2016-01-22 11:10:12 -08:00
9113a27bde lease: grant consistent lease IDs
When raft broadcasts a Grant to all nodes, all nodes must
agree on the same lease ID. Otherwise, attaching a key to
a lease will fail since the lease ID is node-dependent.
2016-01-22 09:43:39 -08:00
6413c96024 Merge pull request #4254 from gyuho/check_wait
client: do not timeout when wait is true
2016-01-21 18:59:03 -08:00
bbb7fb5a46 client: do not timeout when wait is true
Current V2 watch waits by encoding URL with wait=true.
When a client sets 'no-sync', it requests directly to
proxy and the proxy redirects it by cloning the request
object, which leads to cancel the original request when
it times out and the cloned request gets closed prematurely.

This fixes coreos#3894 by querying
the original client request in order to not use context timeout
when 'wait=true'.
2016-01-21 18:45:15 -08:00
1db0148ffd Merge pull request #4252 from gyuho/client_doc
*: move EndpointSelection doc to godoc
2016-01-21 10:02:32 -08:00
2a3bd01f58 *: move EndpointSelection doc to godoc
This merges two redundant documentation into one.
2016-01-21 09:48:05 -08:00
6eb0b35ed3 Merge pull request #4231 from mitake/go-client-doc
Documentation: add a doc for using the go client library
2016-01-21 00:46:14 -08:00
e196a0e8d2 Documentation: add a doc for using the go client library
This commit adds a document that provides tips of how to use the go
client library. Currently it describes how to use the
client.SelectionMode parameter that is added in
https://github.com/coreos/etcd/pull/4030.
2016-01-21 17:31:36 +09:00
1ec71c1cdc Merge pull request #4250 from mitake/ls-quorum
etcdctl: add an option to ls for consistent result
2016-01-20 21:15:35 -08:00
cae0577619 etcdctl: add an option to ls for consistent result
Like the commit 11f49a0960, this commit adds a new option "--quorum"
to etcdctl ls command. It is required for obtaining a consistent
result.
2016-01-21 14:03:10 +09:00
5184260907 Merge pull request #4249 from gyuho/minor_typo
*: minor typos, kill TODOs
2016-01-20 16:25:18 -08:00
835d824965 *: minor typos, kill TODOs 2016-01-20 16:21:39 -08:00
cd323e0ec8 Merge pull request #4242 from gyuho/unsynced_multi
integration: TestV3WatchMultipleEventsPutUnsynced
2016-01-20 15:39:49 -08:00
39116e2e20 integration: TestV3WatchMultipleEventsPutUnsynced 2016-01-20 15:31:13 -08:00
d26b1460c5 Merge pull request #4248 from gyuho/rest_of_unsynced_test
integration: add more tests for unsynced watch
2016-01-20 12:32:08 -08:00
96f646c586 integration: add more tests for unsynced watch
For https://github.com/coreos/etcd/issues/4216.
2016-01-20 12:13:58 -08:00
6a43aa28fe Merge pull request #4247 from gyuho/unsynced_cancel
integration: cancel operation for unsynced watcher
2016-01-20 11:47:31 -08:00
8c40232198 integration: cancel operation for unsynced watcher
Related https://github.com/coreos/etcd/issues/4216.
2016-01-20 11:28:31 -08:00
46bb2582fe raft: Call maybeCommit after removing a node.
removeNode reduces the required quorum size, so some pending entries may
be able to commit after it is applied.

Discovered in cockroachdb/cockroach#3642
2016-01-20 11:05:48 -04:00
c185bdaf95 raft: Improve formatting of DescribeMessage 2016-01-20 11:03:07 -04:00
092879291f Merge pull request #4228 from mitake/procfile-pprof
Procfile: enable pprof in Procfiles
2016-01-19 22:33:37 -08:00
72195afbc9 Merge pull request #4237 from gyuho/multi_stream
integration: add TestV3WatchMultipleStreams
2016-01-19 14:37:33 -08:00
12362d292d integration: add TestV3WatchMultipleStreams
Related https://github.com/coreos/etcd/issues/4216.
2016-01-19 14:27:14 -08:00
d2e35f68f9 Merge pull request #4235 from gyuho/watch_multi_synced
integration: watch test for multi-events with txn
2016-01-19 14:00:58 -08:00
a571f83343 Merge pull request #4232 from heyitsanthony/test-fmt-first
test: test fmt before running unit tests
2016-01-19 13:47:09 -08:00
166055b443 integration: watch test for multi-events with txn
Related to https://github.com/coreos/etcd/issues/4216.
2016-01-19 13:47:09 -08:00
de765ac7dd Merge pull request #4236 from heyitsanthony/fix-watch-test-race
integration: fix race in WatchFromCurrentRevision
2016-01-19 12:50:23 -08:00
7e0a5b8ed7 integration: fix race in WatchFromCurrentRevision
Since watching from current revision, keys should be put after the
watcher is registered or the test may time out. Shows up in CI.
2016-01-19 12:30:49 -08:00
63197782ea test: trigger formatting tests before unit tests
Don't waste time running full unit tests only to fail CI over fmt or vet.
2016-01-19 11:39:39 -08:00
7d9a88a687 test: refactor sorts of tests into separate functions 2016-01-19 11:39:31 -08:00
882d63226d Procfile: enable pprof in Procfiles
Procfiels are used for development purpose. So enabling pprof in
default would be confortable for developers.
2016-01-18 15:18:57 +09:00
9ca3c8e581 Merge pull request #4224 from heyitsanthony/v3-rangereq-more-flag
support V3 rangereq more flag and revision request flag
2016-01-17 22:16:33 -08:00
05531b4600 integration: test v3 RangeRequest 2016-01-17 22:03:08 -08:00
ccfd68a251 etcdserver: support Revision option in v3 RangeRequest 2016-01-17 21:45:22 -08:00
8df3f0c545 etcdserver: support 'More' flag for v3 RangeRequest 2016-01-17 21:45:22 -08:00
1090320bb2 Merge pull request #4227 from gyuho/wait_response
integration: use WaitResponse for watch tests
2016-01-17 16:32:32 -08:00
f9b505ae56 integration: use WaitResponse for watch tests 2016-01-17 14:11:30 -08:00
98dfdebf13 Merge pull request #4225 from gyuho/watch_test_multi
integration: add TestV3WatchMultiple
2016-01-16 20:32:26 -08:00
0f3573a57e integration: add TestV3WatchMultiple
For https://github.com/coreos/etcd/issues/4216.
2016-01-16 20:23:45 -08:00
22dd738228 Merge pull request #4223 from gyuho/watch_cancel_test
integration: watch cancel test
2016-01-15 17:23:25 -08:00
2535509811 integration: add TestV3WatchCancel
Related https://github.com/coreos/etcd/issues/4216.
2016-01-15 17:16:12 -08:00
01ba9960d9 Merge pull request #4196 from Timer/etcdctl-docs
Add docs for configuration flags and env vars
2016-01-15 16:59:44 -08:00
476178fce0 etcdctl: add docs for configuration flags and env vars 2016-01-15 19:46:29 -05:00
bc6613902f Merge pull request #4222 from heyitsanthony/concurrent-watch-testing
integration: submit keys concurrently with watcher streaming
2016-01-15 11:10:40 -08:00
be4466e331 Merge pull request #4215 from xiang90/fix_proxy
etcdmain: fix proxy srv lookup
2016-01-15 11:03:21 -08:00
9aea99cd6e integration: submit keys concurrently with watcher streaming
Tests for races between producer and consumer on watcher
2016-01-15 10:57:33 -08:00
9b5313a97c Merge pull request #4218 from gyuho/test_header_revision
integration: test header revision in v3 grpc
2016-01-14 21:42:11 -08:00
2f2b408686 integration: test header revision in v3 grpc
Related https://github.com/coreos/etcd/issues/4216.
2016-01-14 21:26:13 -08:00
2f9f4220c6 Merge pull request #4217 from heyitsanthony/start-store-on-rev1
storage: start initial revision at 1
2016-01-14 17:16:55 -08:00
ecba9b61cf storage: start initial revision at 1
When the start revision was 0, there was no way to safely watch
starting from the first store revision.
2016-01-14 17:05:56 -08:00
cf873bcf81 Merge pull request #4214 from gyuho/watch_integration_test
integration: add WatchFromCurrentRevision test
2016-01-14 16:56:56 -08:00
d036ac85cb integration: add WatchFromCurrentRevision test 2016-01-14 16:44:59 -08:00
f3daa9f677 etcdmain: proxy should only lookup srv if there is no existing cluster file 2016-01-14 11:23:36 -08:00
02ab7be106 Merge pull request #4211 from gyuho/stack_watch
*: FatalStack to stacktrace tests after timeout
2016-01-14 10:27:44 -08:00
497bbd3010 *: FatalStack to stacktrace tests after timeout
Related to https://github.com/coreos/etcd/issues/4065.
2016-01-14 10:20:14 -08:00
2eac21ae0a Merge pull request #4210 from xiang90/fix_panic
storage: fix panic in test
2016-01-14 08:05:46 -08:00
ec0877b239 Merge pull request #4212 from gyuho/typo_packages
*: fix minor typos
2016-01-14 13:29:52 +01:00
b6077f9d57 *: fix minor typos 2016-01-14 01:28:29 -08:00
f2b0689f74 storage: fix panic in test 2016-01-13 23:18:27 -08:00
84d7318820 Merge pull request #4208 from xiang90/fix_test
backend: make test more reliable
2016-01-13 22:52:15 -08:00
e1de19bf75 backend: make test more reliable 2016-01-13 22:00:25 -08:00
0f782762f4 Merge pull request #4179 from wangjia184/master
doc/libraries-and-tools.md: add .net client library
2016-01-13 21:19:44 -08:00
wj
5a3ba5202d doc/libraries-and-tools.md: add .net client library 2016-01-14 12:24:36 +08:00
6c82d768b2 Merge pull request #4201 from mitake/benchmark-pprof
tools/benchmark: add flags for pprof to storage put
2016-01-13 20:17:30 -08:00
1c802e9266 tools/benchmark: add flags for pprof to storage put
This commit adds flags for profiling with runtime/pprof to storage
put:
- --cpuprofile: specify a path of CPU profiling result, if it is not
    empty, profiling is activated
- --memprofile: specify a path of heap profiling result, if it is not
    empty, profiling is activated

Of course, the flags should be added to RootCmd ideally. However,
adding common flags that shared by children command requires the
ongoing PR: https://github.com/spf13/cobra/pull/220 . Therefore this
commit adds the flags to storage put only.
2016-01-14 13:10:35 +09:00
a8a786598d Merge pull request #4206 from xiang90/fix_test
storage: extend timeout for slow CI
2016-01-13 16:15:07 -08:00
0cba12d991 storage: extend timeout for slow CI
1. extend timeout

2. print out stacktrace. When it fails again, we can get more confidence that the
failure is caused by slow IO.
2016-01-13 16:04:19 -08:00
5f3f09f82d Merge pull request #4200 from mitake/deadcode
tools/benchmark: remove deadcode
2016-01-13 15:24:05 -08:00
9e11da50ad Merge pull request #4190 from heyitsanthony/v3-integration-test
integration: testing support for v3 grpc api
2016-01-13 14:46:11 -08:00
53186da0a9 integration: a few v3 grpc api tests 2016-01-13 14:24:27 -08:00
6949f052c4 integration: add support for grpc server and client 2016-01-13 14:20:26 -08:00
722fb43797 Merge pull request #4202 from gyuho/prefix_watch
storage: check prefix in unsynced
2016-01-13 11:32:43 -08:00
d2d70513a1 Merge pull request #4185 from xiang90/client-srv
*: support client srv target
2016-01-13 11:26:35 -08:00
4f427bca43 storage: check prefix in unsynced
Current syncWatchers method skips the events that have
prefixes that are being watched when the prefix is not
existent as a key. This fixes https://github.com/coreos/etcd/issues/4191
by adding prefix checking to not skip those events.
2016-01-13 11:21:47 -08:00
bfa21001a1 *: support client srv target 2016-01-13 11:12:15 -08:00
c59bc9c4db Merge pull request #4204 from gyuho/typo
client: keys.go minor typo
2016-01-13 11:11:50 -08:00
b47f721a98 integration: configure cluster with configCluster struct
makes discovery, tls, and v3 explicitly part of the cluster information
2016-01-13 11:09:13 -08:00
3ac8ff3a84 client: keys.go minor typo 2016-01-13 11:02:37 -08:00
f2a4993c11 Merge pull request #4187 from gyuho/store_stats
Documentation: stats/metrics reset on etcd restart
2016-01-13 09:53:17 -08:00
b6d5993121 Merge pull request #4193 from gyuho/etcdctl_help_doc
etcdctl, etcdctlv3: add help message for non-valid value arg
2016-01-13 09:52:59 -08:00
5fb26f2e8c Documentation: stats/metrics reset on etcd restart
Addressing https://github.com/coreos/etcd/issues/4035.
2016-01-13 09:51:13 -08:00
054db2e3cf etcdctl, etcdctlv3: add help message for non-valid value arg
Unix commands interprets argument value of first character '-'
as a flag. And this won't be fixed in our upstream CLI libraries.
This adds help message to show users workarounds.

Addressing https://github.com/coreos/etcd/issues/2020.
2016-01-13 09:25:24 -08:00
ca9ad6897a Merge pull request #4192 from heyitsanthony/gexpect-tests
e2e: etcd end-to-end integration tests
2016-01-13 09:10:29 -08:00
b35ab33045 Merge pull request #4155 from mitake/pprof
etcdmain: add options for pprof
2016-01-13 03:13:50 -08:00
16b63310b2 tools/benchmark: remove deadcode
The Execute() function is a deadcode. Let's remove it.
2016-01-13 16:57:53 +09:00
588f655b4e etcdmain: add an option for pprof
This commit adds a new option for activating profiling based on pprof
in etcd process.
 - -enable-pprof: boolean type option which activates profiling

For example, if a client URL is http://localhost:12379, users and
developers access heap profiler with this URL:
http://localhost:12379/debug/pprof/heap
2016-01-13 16:12:26 +09:00
b83c52888c Merge pull request #4199 from heyitsanthony/fix-recorder-datarace
testutil: fix data race in RecorderBuffered
2016-01-12 21:56:58 -08:00
6de07cf9ea e2e: etcd end-to-end tests
Uses gexpect to test the etcd binary directly. Tests #4135, #4171
2016-01-12 21:27:59 -08:00
54c905f87f testutil: fix data race in RecorderBuffered
Was accessing a shared data structure instead of the private copy.

Fixes #4198
2016-01-12 21:08:51 -08:00
da9378b7e2 Merge pull request #4194 from gyuho/etcdctl_check_key
client: add IsKeyNotFound function
2016-01-12 20:12:36 -08:00
f67f6d7c7c client: add IsKeyNotFound function
This can be used to check if an error is client.ErrorCodeKeyNotFound.
Related to https://github.com/coreos/etcd/issues/4080.
2016-01-12 20:04:08 -08:00
1df478ea5d Merge pull request #4195 from davygeek/master
follow golint notice replace '+=1' to '++'
2016-01-12 18:00:53 -08:00
194607812c raft: follow golint notice to replace +=1 with ++ 2016-01-13 09:39:00 +08:00
9b27198698 godeps: add gexpect and deps 2016-01-12 16:53:15 -08:00
42b5bc021a Merge pull request #4186 from xiang90/fix_store
store: handle watch dir removal correctly
2016-01-12 14:47:02 -08:00
efa9cd7e0c Merge pull request #4184 from heyitsanthony/v3-rangereq-sort
etcdserver: support sorted range requests in v3 api
2016-01-12 10:26:52 -08:00
3aa5cee1f3 Merge pull request #4188 from xiang90/lease_stop
*: stop lessor when etcdserver is stopped
2016-01-12 10:23:48 -08:00
82eeffbd58 etcdserver: support sorted range requests in v3 api
Fixes #4166
2016-01-12 10:08:30 -08:00
6b1d9fb7ce *: stop lessor when etcdserver is stopped 2016-01-12 10:06:11 -08:00
547250321b store: handle watch dir removal correctly 2016-01-11 22:42:56 -08:00
374b14e471 Merge pull request #4178 from xiang90/lease_keepalive
*: now lease keepAlive works on leader
2016-01-10 13:13:31 -08:00
59bf83c7f4 *: now lease keepAlive works on leader 2016-01-09 22:12:02 -08:00
20cdd7db8e Merge pull request #4173 from xiang90/lease_attach
support lease Attach
2016-01-09 11:10:45 -08:00
f01c8188f8 *: rename lease.DeleteableRange to lease.RangeDeleter 2016-01-09 11:01:58 -08:00
f5753f2f51 *: support lease Attach
Now we can attach keys to leases. And revoking the lease removes all
the attached keys of that lease.
2016-01-09 11:01:58 -08:00
133f46246e Merge pull request #4176 from xiang90/refactor
lease: remove minExpiry and add helper funcs
2016-01-08 14:40:26 -08:00
12912501bd lease: remove minExpiry and add helper funcs 2016-01-08 14:29:33 -08:00
0554a18060 Merge pull request #4175 from xiang90/lease_expire
*: revoke expired leases
2016-01-08 13:47:06 -08:00
2566699a48 *: revoke expired leases 2016-01-08 13:37:58 -08:00
f45a8fe623 Merge pull request #4174 from heyitsanthony/fix-limit-keepalive
etcdmain: support keep alive listeners on limit listener connections
2016-01-08 10:41:57 -08:00
7de0e9130c Merge pull request #4167 from xiang90/lease_promote
*: expose Lessor Promote and Demote interface
2016-01-08 10:38:08 -08:00
1814867733 Godeps: remove golang.org/x/net/netutil
Now using our own LimitListener to support KeepAlives.
2016-01-08 10:32:25 -08:00
811fbc5672 etcdmain: support keep alive listeners on limit listener connections
Fixes #4171
2016-01-08 10:11:31 -08:00
cb3ca4f8fb Merge pull request #4169 from gyuho/typo
*: fix minor typos
2016-01-08 11:20:59 +01:00
f76166a041 *: fix minor typos 2016-01-08 00:21:19 -08:00
a7287b6374 Merge pull request #4161 from gyuho/typo_in_benchmark
tools/benchmark/cmd: print error out to stderr
2016-01-07 18:50:09 -08:00
f5fa9b5384 *: expose Lessor Promote and Demote interface 2016-01-07 18:18:20 -08:00
99bee2fd29 Merge pull request #4162 from xiang90/lease
*: add support for lease create and revoke
2016-01-07 16:58:59 -08:00
d9ca929a33 *: add support for lease create and revoke
Basic support for lease operations like create and revoke.
We still need to:
1. attach keys to leases in KV implmentation if lease field is set
2. leader periodically removes expired leases
3. leader serves keepAlive requests and follower forwards keepAlive
requests to leader.
2016-01-07 16:39:39 -08:00
383e11d528 Merge pull request #4165 from heyitsanthony/benchmark-stddev
tools/benchmark: report standard deviation
2016-01-07 16:00:26 -08:00
187064187c tools/benchmark: report standard deviation 2016-01-07 15:30:23 -08:00
2ff95b68e7 Merge pull request #4163 from mordyovits/patch-1
storage: minor typo in metrics help field
2016-01-07 13:35:59 -08:00
0d01035693 Minor typo in metrics help field 2016-01-07 16:18:50 -05:00
cc39cde319 tools/benchmark/cmd: print error out to stderr 2016-01-07 12:55:26 -08:00
4d921ab0e4 Merge pull request #4159 from gyuho/kv_header
etcdserver/api/v3rpc: fill in KV ResponseHeader
2016-01-07 12:27:49 -08:00
5842177172 etcdserver/api/v3rpc: fill in KV ResponseHeader 2016-01-07 12:18:31 -08:00
6f39608624 Merge pull request #4160 from xiang90/fix_lease
lease: unlock before another batch operation
2016-01-07 11:20:31 -08:00
8774d53459 Merge pull request #4158 from heyitsanthony/nolease-to-leasepkg
lease: move storage.NoLease to lease package
2016-01-07 10:43:51 -08:00
f22ea70c14 lease: unlock before another batch operation 2016-01-07 10:41:16 -08:00
9e0378998b Merge pull request #4153 from xiang90/fix_listener
etcdmain: tls listener MUST be at the outer layer of all listeners
2016-01-07 10:36:24 -08:00
f9af744be3 lease: move storage.NoLease to lease package 2016-01-07 10:33:44 -08:00
1f97f2dc36 etcdmain: tls listener MUST be at the outer layer of all listeners
go HTTP library uses type assertion to determine if a connection
is a TLS connection. If we wrapper TLS Listener with any customized
Listener that can create customized Conn, HTTPs will be broken.

This commit fixes the issue.
2016-01-07 10:26:49 -08:00
db60cdc42c Merge pull request #4154 from xiang90/snapshot_from_backend
*: get snapshot from backend
2016-01-06 23:10:23 -08:00
43a777b7a2 *: get snapshot from backend
We should get snapshot from backend, not just KV.
We plan to store the lease data into backend too.
2016-01-06 22:40:23 -08:00
b42a0e4283 Merge pull request #4151 from xiang90/s
storage: support recovering from backend
2016-01-06 22:17:18 -08:00
1714290f4e storage: support recovering from backend
We want the KV to support recovering from backend to avoid
additional pointer swap. Or we have to do coordination between
etcdserver and API layer, since API layer might have access to
kv pointer and use a closed kv.
2016-01-06 21:16:55 -08:00
b546f4c2c2 Merge pull request #4152 from xiang90/fix_force
backend: create bucket should increase pending
2016-01-06 20:47:32 -08:00
5dd8af444a backend: create bucket should increase pending 2016-01-06 20:25:50 -08:00
f91d96c9a4 Merge pull request #4150 from gyuho/fix_arg_update_dir
etcdctl: get only the first argument for updatedir
2016-01-06 16:44:05 -08:00
dd02ec6554 etcdctl: ignore value in updatedir command
Fixes coreos#4145.
client.KeysAPI ignores value if SetOptions.Dir is true.
2016-01-06 16:37:04 -08:00
c70d533771 Merge pull request #4138 from gyuho/watchresponse_header
*: fill in WatchResponse.Header
2016-01-06 15:19:36 -08:00
366e7a879f *: fill in WatchResponse.Header
Related to coreos#3848.
2016-01-06 15:12:53 -08:00
ebbb91a713 Merge pull request #4147 from heyitsanthony/listener-unix-sockets
pkg/transport: support listeners on unix sockets
2016-01-06 12:41:38 -08:00
f2df87f3e4 pkg/transport: support listeners on unix sockets
Given unix://<socketname>, NewListener will listen on unix socket <socketname>.
This is useful when binding to tcp ports is undesirable (e.g., testing).
2016-01-06 12:09:05 -08:00
eab052d5c4 Merge pull request #4141 from ngaut/ngaut/refactor
raft: Rename q() to quorum() which is more readable
2016-01-06 07:32:39 -08:00
027dd6169b Merge pull request #4143 from siddontang/master
raft,rafthttp: fix typo
2016-01-06 00:24:28 -08:00
54a45ba2f5 *: fix typo 2016-01-06 16:17:02 +08:00
8ee232d4ec raft: Rename q() to quorum() which is more readable 2016-01-06 15:23:35 +08:00
45b9cb170d Merge pull request #4142 from gyuho/delete_test_file
storage: remove test file after test
2016-01-05 22:15:40 -08:00
ad29ba3073 storage: remove test file after test 2016-01-05 22:05:28 -08:00
82f2cd6cef Merge pull request #4140 from xiang90/storage
*: make backend outside kv
2016-01-05 20:34:17 -08:00
5dd3f91903 *: make backend outside kv
KV and lease will share the same backend. Thus we need to make
backend outside KV.
2016-01-05 19:55:29 -08:00
70d120e08e Merge pull request #4137 from xiang90/lease
lease: recovery leases from backend
2016-01-05 11:32:03 -08:00
9156e54f1f lease: recovery leases from backend 2016-01-05 11:21:11 -08:00
1e61243fd7 Merge pull request #4134 from xiang90/lease
lease: modify API and persist lease to disk
2016-01-05 10:30:52 -08:00
23ddb9ff30 Merge pull request #4126 from heyitsanthony/testutil-recorder-stream
remove WaitSchedule() from etcdserver tests
2016-01-05 10:19:14 -08:00
09b420f08c *: move leaseID typedef to lease pkg 2016-01-05 10:18:17 -08:00
25f82b25f7 lease: modify API and persist lease to disk 2016-01-05 10:09:42 -08:00
838328b057 etcdserver: fix racey WaitSchedule() tests to wait for recorder actions
Fixes #4119
2016-01-05 09:39:18 -08:00
384cc76299 pkg/testutil: make Recorder an interface
Provides two implementations of Recorder-- one that is non-blocking
like the original version and one that provides a blocking channel
to avoid busy waiting or racing in tests when no other synchronization
is available.
2016-01-05 09:39:18 -08:00
e1bf726bc1 *: split out etcdserver's test mockup objects to live in interfaces' packages 2016-01-05 09:39:13 -08:00
be57b6b10e Merge pull request #4133 from gyuho/event_map_lookup
storage: fix watchable_store notify to hash-lookup once
2016-01-04 22:46:37 -08:00
7339abc79e storage: fix watchable_store notify to hash-lookup once
We should just assign events and ok at first, and check the
boolean value, instead of looking up the map twice.
2016-01-04 22:34:48 -08:00
d9d1342869 Merge pull request #4132 from heyitsanthony/watchid-typedef
storage: change type of WatchIDs from int64 to WatchID
2016-01-04 20:02:26 -08:00
21a6ade53d storage: change type of WatchIDs from int64 to WatchID 2016-01-04 19:52:22 -08:00
e5e355242d Merge pull request #4130 from gyuho/remove_cancelfunc
*: remove CancelFunc return for Watch. Use Cancel for watch.
2016-01-04 16:22:44 -08:00
556d4a6932 *: remove CancelFunc return for Watch. Use Cancel for watch. 2016-01-04 16:17:55 -08:00
b3ad736d2a Merge pull request #4131 from xiang90/kv_lease
*: support put with lease
2016-01-04 16:05:35 -08:00
4336278b44 *: support put with lease 2016-01-04 15:54:06 -08:00
2e2b0ea9a1 Merge pull request #4128 from akolb1/akolb1-4127
storage/backend: disable MAP_POPULATE on Solaris
2016-01-04 14:53:40 -08:00
152dcdd04d storage/backend: disable MAP_POPULATE on Solaris 2016-01-04 14:42:57 -08:00
e2bd35ba1a Merge pull request #4129 from gyuho/licence_2016
header: change date to 2016
2016-01-04 14:13:46 -08:00
58112f3abe header: change date to 2016
Update LICENCE header year to 2016.
2016-01-04 14:05:22 -08:00
cfe23b886d Merge pull request #4125 from ngaut/ngaut/refactor
raft: Tiny refactor
2016-01-04 11:49:08 -08:00
b38dfda1c9 raft: Tiny refactor
Rename i to id since i looks like index which is confusing.
2016-01-04 21:20:54 +08:00
e0a06bb5d6 Merge pull request #4123 from ngaut/ngaut/typo
raft: typo
2016-01-03 20:27:41 -08:00
acee23112a raft: typo 2016-01-04 11:51:51 +08:00
bf9e2a550c Merge pull request #4122 from gyuho/watchid_events
*: WatchResponse for multiple Events with WatchID
2016-01-03 16:40:28 -08:00
6540f47dfa *: WatchResponse for multiple Events with WatchID
storage/storagepb: remove watchID from Event

storage: add WatchResponse to watcher.go to wrap events, watchID

storage: watchableStore to use WatchResponse

storage: kv_test with WatchResponse

etcdserver/api/v3rpc: watch to receive storage.WatchResponse type
2016-01-03 16:32:48 -08:00
c832d7f6e2 Merge pull request #4120 from xiang90/ctrl_w
*: support watcher cancellation inside watchStream
2016-01-03 09:14:12 -08:00
eda0eefc25 *: support watcher cancellation inside watchStream 2016-01-03 00:20:21 -08:00
94ac9ae2da Merge pull request #4118 from xiang90/ctrl_w
v3api: add support for sending watcher control response
2016-01-02 23:27:34 -08:00
ec12686233 v3api: add support for sending watcher control response 2016-01-02 22:31:22 -08:00
4fa0cd5765 Merge pull request #4117 from xiang90/rm_watching
storage: rename watching -> watcher
2016-01-02 21:00:02 -08:00
807db7e2aa storage: rename watching -> watcher 2016-01-02 20:20:22 -08:00
34187a4fbe Merge pull request #4114 from xiang90/r_watch_stream
*: rename watcher to watchStream
2016-01-02 18:29:18 -08:00
ee0b3f42ed *: rename watcher to watchStream
Watcher vs Watching in storage pkg is confusing. Watcher should be named
as watchStream since it contains a channel as stream to send out events.
Then we can rename watching to watcher, which actually watches on a key
and send watched events through watchStream.

This commits renames watcher to watchStram.
2016-01-02 16:03:57 -08:00
41771d9522 Merge pull request #4112 from xiang90/proto
*: update watch related proto
2016-01-01 13:10:19 -08:00
ac330bb7c9 *: update watch related proto
1. Add watch/cancel request
2. Add necessary fields in response to return watch error
3. Add watch_id into watch response
2016-01-01 10:22:21 -08:00
7dd599b69d Merge pull request #4091 from gyuho/watch_events_slice
storage: watch events in slice
2015-12-31 23:54:08 -08:00
37b643b11d etcdctlv3/command: watch command to receive events slice 2015-12-31 23:44:02 -08:00
b59c993681 storage: kv_test.go events slice 2015-12-31 23:44:02 -08:00
14a0268ebc storage: watchable_store_test.go events slice 2015-12-31 23:44:02 -08:00
0b01acf131 storage: watchable_store.go events slice 2015-12-31 23:44:02 -08:00
f568a1ccfc storage: watcher_test.go events slice 2015-12-31 23:43:59 -08:00
a74147384d Merge pull request #4070 from mitake/storage-bench
tools: a new tool for benchmarking storage backends
2015-12-31 22:26:46 -08:00
454865bd67 tools: a new tool for benchmarking storage backends
Current etcd repository has a test for benchmarking a storage backend
in storage/kvstore_bench_test.go. However, it is hard to test various
parameters (e.g. batch interval, a number of keys, etc) with the test.

This commit adds a new benchmarking subcommand "storage" to
tools/benchmark. It will encourage analysis of storage backends with
various parameter and complex workloads.

Exmaple usage:
$ ./benchmark storage put
total: 9.894173792s
average: 9.894173ms
minimum latency: 6.596991ms
maximum latency: 29.455695ms
2016-01-01 15:18:54 +09:00
5aded6cd77 storage: watcher.go events slice 2015-12-31 19:26:20 -08:00
8f03c600b5 etcdserver/api/v3rpc: watch.go with events slice 2015-12-31 19:25:15 -08:00
8da6e76588 etcdserver/etcdserverpb: rpc WatchResponse with events slice 2015-12-31 19:24:46 -08:00
0b2d31f3bc storage: decouple default parameters and storage creation
newStore() uses constants for some important parameters
e.g. batchInerval. It is not suitable for other storage users
(e.g. tools/benchmark). This commit decouples the default parameters
from storage creation. etcd should use a newly added function
newDefaultStore() for creating a store with default parameters.
2015-12-31 22:28:59 +09:00
73230f9603 Merge pull request #4108 from xiang90/proto
*: fix proto and regenerate all go files
2015-12-30 20:29:04 -08:00
1dc0e664f0 *: fix proto and regenerate all go files 2015-12-30 20:11:19 -08:00
4444d92032 Merge pull request #4105 from heyitsanthony/benchmark-put-seq-keys
tools/benchmark: support puts on sequential keys over a bounded keyspace
2015-12-30 17:04:37 -08:00
1689bb3f02 tools/benchmark: support puts on sequential keys over a bounded keyspace
This patch makes it possible to use benchmark to generate n keys and
make random updates to only those n keys.
2015-12-30 16:47:24 -08:00
c8a0cc80dc Merge pull request #4104 from xiang90/ctl
etcdctl: fix syncWithPeerAPI by breaking the loop when there is no error
2015-12-30 11:17:30 -08:00
34abead33e etcdctl: fix syncWithPeerAPI by breaking the loop when there is no error 2015-12-30 11:06:42 -08:00
795b824f4d Merge pull request #4100 from mitake/ignore-benchmark
tools/benchmark: ignore the binary "benchmark"
2015-12-29 23:19:28 -08:00
b97f78d356 tools/benchmark: ignore the binary "benchmark" 2015-12-30 16:02:52 +09:00
f19a07289a Merge pull request #4098 from gyuho/merge_log
*: use merge logger for repeating etcdserver error logs
2015-12-29 21:23:30 -07:00
8346a7c052 Merge pull request #4094 from heyitsanthony/send-merged-done-nowait
etcdserver: respect done channel when sleeping for snapshot backoff
2015-12-29 20:18:08 -08:00
8f943f2f45 etcdserver/etcdhttp: use MergeLogger to log etcdserver errors
Related https://github.com/coreos/etcd/issues/3812.
2015-12-29 20:00:52 -08:00
64032541c3 pkg/logutil: round off start time, add merge_logger_test.go 2015-12-29 20:00:46 -08:00
942b5570bd Merge pull request #4096 from heyitsanthony/serialize-applier-snapmerge
etcdserver: serialize snapshot merger with applier
2015-12-29 19:37:11 -08:00
4cd86ae1ef etcdserver: serialize snapshot merger with applier
Avoids inconsistent snapshotting by only attempting to
create a snapshot after an apply completes.

Fixes #4061
2015-12-29 18:38:39 -08:00
04ac8969a1 Merge pull request #3986 from mqliang/defer
refactor store/store.go
2015-12-29 14:42:11 -07:00
4b355bd81e Merge pull request #4093 from xiang90/rm_proto
doc: remove proto in rfc, link to proto file in codebase
2015-12-29 13:25:46 -08:00
c7c3bda8b9 etcdserver: respect done channel when sleeping for snapshot backoff 2015-12-29 13:23:41 -08:00
9d9680121c etcdserver/etcdserverpb: make rpc.proto updated 2015-12-29 12:58:14 -08:00
39ed73c290 doc: remove proto in rfc, link to proto file in codebase 2015-12-29 12:53:27 -08:00
d9d7137ea3 Merge pull request #4092 from xiang90/api
*: update api proto
2015-12-29 12:46:12 -08:00
1a0201a31a *: update api proto 2015-12-29 12:31:53 -08:00
d62edfb464 Merge pull request #4090 from xiang90/writable
etcdserver: always check if the data dir is writable before starting
2015-12-29 11:40:41 -08:00
150e646b05 etcdserver: always check if the data dir is writable before starting etcd 2015-12-29 11:29:01 -08:00
f1761798e9 Merge pull request #4089 from xiang90/fix
etcdserver: fix creating member dir
2015-12-28 23:37:50 -08:00
0f2675d9b6 Merge pull request #4060 from heyitsanthony/fix-etcdctl-err
etcdctl: return exitcode 2 if can't connect to any peer urls
2015-12-28 23:28:29 -08:00
aef55342d1 etcdsever: avoid creating member dir before finishing validate bootstrap
This commit fixes the issue of creating member dir before validating
the configuration. When member dir exists, it indicates the local etcd
process is a valid etcd member. So we should only create member dir
after we finish configuration validation, joining validation or
discovery validation.
2015-12-28 23:20:27 -08:00
eae52a3138 Revert "etcdserver: always remove member directory when bootstrap fails"
This reverts commit a7e443d641.
2015-12-28 23:16:08 -08:00
cd42c9139e Merge pull request #4087 from gyuho/delete_discovery_check
etcdserver: always remove member directory when bootstrap fails
2015-12-29 00:04:37 -07:00
4477ef636e etcdctl: return exitcode 2 if can't connect to any peers 2015-12-28 23:04:34 -08:00
a7e443d641 etcdserver: always remove member directory when bootstrap fails
This removes member directory when bootstrap fails including joining existing
cluster and forming a new cluster. This fixes https://github.com/coreos/etcd/issues/3827.
2015-12-28 22:56:03 -08:00
c3655cbfd9 Merge pull request #4084 from gyuho/revisioin_document
storage: add/fix revision/generation description
2015-12-28 20:45:36 -07:00
5284c1c3bb storage: add/fix revision/generation description
This adds some struct descriptions to revision/generation structs and corrects
some spelling errors.
2015-12-28 19:32:03 -08:00
f3b3557951 Merge pull request #4085 from gyuho/doc_iana
Documentation: clarification on official etcd ports
2015-12-28 18:13:25 -08:00
7c587e0855 Documentation: clarification on official etcd ports
This is for https://github.com/coreos/etcd/issues/3480.
This adds a notice that etcd officially use 2379 and 2380 for
client and server ports.
2015-12-28 16:29:19 -08:00
6c5dc28d0f Merge pull request #4043 from gyuho/storage_range_all_unsynced
storage: range all unsynced at once
2015-12-28 14:45:40 -07:00
0d7fb820c7 Merge pull request #4082 from gyuho/storage_test_cleanup
storage: clean up test variable names, minor typos in comments
2015-12-28 13:02:17 -07:00
d64eb94580 storage: clean up test variable names, minor typos in comments
This just changes variable name to be more consistent: `N` rather than `Size`.
And fix some minor grammatical errors.
2015-12-28 11:49:57 -08:00
ecc3e15a46 storage: delete RangeHistory
This has been replaced by operations inside `syncWatchings`.
2015-12-28 11:37:32 -08:00
78b0b8a4a0 storage: range all unsynced at once
This is for https://github.com/coreos/etcd/issues/3848.
It replaces RangeHistory method for more efficient event
sending.
2015-12-28 11:37:26 -08:00
570687a509 Merge pull request #4081 from heyitsanthony/benchmark-multi-ep
tools/benchmark: support connecting to several endpoints
2015-12-28 10:31:35 -08:00
8e728afa62 tools/benchmark: support connecting to several endpoints
--endpoints is comma separated but gRPC blocks forever on comma
separated lists. Instead, round-robin select endpoints when
creating new connections.
2015-12-28 10:22:33 -08:00
d07a9cd893 Merge pull request #4059 from xiang90/snap_log
rafthttp: better snapshot sending logging
2015-12-28 10:05:35 -08:00
466b33445f Merge pull request #4079 from gyuho/store_shadow
store: fix govet shadow on expiration variable
2015-12-28 11:03:57 -07:00
b072f0b048 store: fix expiration var shadowing, change test function names
Found at https://travis-ci.org/coreos/etcd/jobs/99087279#L298.
And changes test function names to make them clear.
2015-12-28 08:50:34 -08:00
aec356e416 Merge pull request #4064 from xiang90/reduce_fysnc
backend: do not commit unless there is a pending change
2015-12-28 07:31:50 -08:00
1238187b72 Merge pull request #4078 from jonboulle/master
docs: clarify CAS/CAD do not work on directories
2015-12-27 22:35:51 +01:00
216c6674ed docs: clarify CAS/CAD do not work on directories
As noted in #4075. There are numerous style issues with this document
but I don't want to go down that rabbit hole so this is an attempt at a
minimally invasive clarification.
2015-12-27 22:35:08 +01:00
729b530c48 Merge pull request #4071 from gyuho/store_event_node
store: clean up event.go, node.go and add tests
2015-12-25 21:42:50 -07:00
64e182c69e store: clean up event.go, node.go and add tests
Updates IsCreated logic on event.go. Cleans up node.go
and adds tests to it.
2015-12-25 13:25:12 -08:00
70bcde89bc Merge pull request #4073 from gyuho/remove_seed
storage: remove unnecessary math/rand seed
2015-12-25 19:14:35 +01:00
df0c2e6842 storage: remove unnecessary math/rand seed
As @jonboulle pointed out at
https://github.com/coreos/etcd/pull/4070/files#r48441847:

> math/rand is unrelated to crypto/rand; the latter reads from /dev/urandom and
> is relying on the kernel's PRNG. Just remove the seed entirely.
2015-12-25 09:55:11 -08:00
dac56faf61 Merge pull request #4030 from mitake/endpoint-selection
client: add a mechanism for various endpoint selection mode
2015-12-24 12:55:16 +01:00
ff319add53 Merge pull request #4066 from gyuho/tip_shadow
pkg/fileutil: fix error var shadow
2015-12-24 10:24:27 +01:00
8d368c4dba pkg/fileutil: fix error var shadow
Go tip complains about error variable shadowing at
https://travis-ci.org/coreos/etcd/jobs/98636879#L291-L292.
2015-12-23 23:56:26 -08:00
22b3b3e07a Merge pull request #4038 from AkihiroSuda/etcd-4007
pkg/fileutil: skip TestIsDirWriteable when running as root
2015-12-23 22:23:57 -08:00
058f1449d6 pkg/fileutil: skip TestIsDirWriteable when running as root 2015-12-24 14:52:40 +09:00
8bc59b66d1 backend: do not commit unless there is a pending change
Reduce the nubmer of fsync etcd issues when the cluster is
idle.
2015-12-23 18:58:37 -08:00
a46ffc60e5 client: add a mechanism for various endpoint selection mode
Current etcd client library chooses a default destination node from
every member of a cluster in a random manner. However, requests of
write and read (for consistent results) need to be forwarded to the
leader node as the nature of Raft algorithm. If the chosen node is a
follower, additional network traffic will be caused by the forwarding
from follower to leader.

Mainly for reducing the forward traffic, this commit adds a new
mechanism for various endpoint selection mode to the client library
which can be configured with client.Config.SelectionMode.

Currently, two modes are provided:
 - EndpointSelectionRandom: default, same to existing behavior (pick
   a node in a random manner)
 - EndpointSelectionPrioritizeLeader: prioritize leader, for the above
   purpose

I evaluated the effectiveness of the EndpointSelectionPrioritizeLeader
with 4 t1.micro instances of AWS (3 nodes for etcd cluster and 1 node
for etcd client). Client executes this simple benchmark
(https://github.com/mitake/etcd-things/tree/master/prioritize-leader-bench),
just writes 10000 keys. When SelectionMode == EndpointSelectionRandom
(default), the benchmark needed 1 min and 32.102 sec to finish. When
SelectionMode == EndpointSelectionPrioritizeLeader, the benchmark
needed 1 min 4.760 sec.
2015-12-24 11:02:40 +09:00
72e115ee6e Merge pull request #4062 from xiang90/fix_snap
*: fix snapshot sending cycle
2015-12-23 17:10:10 -08:00
3c17e45bcb Merge pull request #4063 from heyitsanthony/fix-shouldstop
etcdserver: stop if removed along with multiple conf changes
2015-12-23 16:46:18 -08:00
d7ad721ede etcdserver: stop if removed along with multiple conf changes
shouldstop would get clobbered when several conf changes are in an apply
2015-12-23 16:29:21 -08:00
4be152bb4f rework 2015-12-23 16:21:16 -08:00
9a51d40940 fix comment 2015-12-23 14:10:39 -08:00
ab31ba0d29 *: fix snapshot sending cycle 2015-12-23 13:58:57 -08:00
74dba2d4cf rafthttp: better snapshot sending logging
snapshot sending is an important event. We should always log it explicitly.
2015-12-23 12:36:07 -08:00
7e5b7cfc65 Merge pull request #4056 from heyitsanthony/benchmark-less-mem
tools/benchmark: stream results into reports
2015-12-23 11:51:13 -08:00
0c640d781c Merge pull request #4047 from heyitsanthony/test-activate-raftexample
test: activate tests on contrib/raftexample
2015-12-23 11:28:50 -08:00
382103af60 tools/benchmark: stream results into reports
Reports depended on writing all results to a large buffered channel and
reading from that synchronously. Similarly, requests were buffered the
same way which can take significant memory on big request strings. Instead,
have reports stream in results as they're produced then print when the
results channel closes.
2015-12-23 11:24:35 -08:00
58ac6aeb5a test: activate tests on contrib/raftexample
adds contrib/raftexample to integration tests and fixes two test races
2015-12-23 11:13:37 -08:00
3f81f020c1 Merge pull request #4050 from xiang90/fsync
snap: call fsync before close db file
2015-12-23 09:42:13 -08:00
45de4e918e Merge pull request #3935 from dgonyeo/master
scripts: rewrote build-aci to use acbuild
2015-12-23 18:37:38 +01:00
834c2cf7cf Merge pull request #4054 from jonboulle/master
raft: small typo fixes in raft package doc
2015-12-23 18:19:47 +01:00
b4c4146d6b Merge pull request #4051 from xiang90/log
rafthttp: log before receiving snapshot
2015-12-23 08:57:28 -08:00
94da4b9ee5 rafthttp: log before receiving snapshot
Database snapshot can be as large as 5GB. It is reasonable
to log before receiving it. Or the user might not know what
is happening and why etcd starts to use IO intensively.
2015-12-23 08:45:36 -08:00
5c65c393a5 raft: small typo fixes in raft package doc 2015-12-23 16:37:06 +01:00
53be8405f3 client: a new API for obtaining a leader node information 2015-12-23 22:54:04 +09:00
191c5ef9cb snap: call fsync before close db file 2015-12-22 22:43:05 -08:00
289de69632 Merge pull request #4048 from xiang90/util
etcdserver: move unti out of server.go
2015-12-22 15:41:39 -08:00
d6d12b4d86 etcdserver: move unti out of server.go
etcdserver file is messy enough. Let's make it be less messy.
2015-12-22 15:17:14 -08:00
59998dbc50 Merge pull request #3882 from colhom/etcd2-backup
contrib/systemd: etcd2-backup package and docs
2015-12-22 15:08:54 -08:00
c147da94a2 Merge pull request #4041 from heyitsanthony/v3-snapshot-low-latency
low latency V3 snapshot recovery
2015-12-22 15:03:35 -08:00
85cd4d9647 contrib/systemd: etcd2-backup package and docs
multi-node backup and restore procedures for etcd2 clusters, presented as systemd jobs.
2015-12-22 14:52:10 -08:00
aca0c466ed etcdserver: asynchronously notify applier when raft writes finish
The raft loop would block on the applier's done channel after
persisting the raft messages; the latency could cause dropped network
messages. Instead, asynchronously notify the applier with a buffered
channel when the raft writes complete.
2015-12-22 14:15:14 -08:00
9d05a0d959 etcdserver: apply v3 database updates outside server event loop
raft's applyc writes block on the server loop's database IO since
the next applyc read must wait on the db operation to finish.
Instead, stream applyc to a run queue outside the server loop.
2015-12-22 14:15:09 -08:00
23d645babd Merge pull request #4046 from heyitsanthony/etcdserver-server-select-refactor
etcdserver: refactor server.go select loop
2015-12-22 12:29:10 -08:00
7e00325fe9 etcdserver: refactor server.go select loop
splits out the apply case into smaller functions
2015-12-22 12:13:39 -08:00
79fa03081c Merge pull request #4045 from philips/add-raftexample-to-raft-docs
raft: add raftexample to the docs
2015-12-22 12:06:23 -08:00
c72e4ae112 raft: add raftexample to the docs
To help people wanting use this package get started point to the
raftexample package.
2015-12-22 12:04:39 -08:00
b79dae287d Merge pull request #4042 from jonboulle/master
pkg: fix tiny docstring typo in ioutil
2015-12-22 17:00:17 +01:00
50efd01e34 pkg: fix tiny docstring typo in ioutil 2015-12-22 16:37:22 +01:00
eaaf98348c Merge pull request #4040 from gyuho/godep_20151221
Godeps: add missing dependencies
2015-12-21 22:30:04 -08:00
293103cc24 Godeps: add missing dependencies
I reran Godep after this patch https://github.com/tools/godep/pull/352.
2015-12-21 22:20:07 -08:00
b1138a42a2 Merge pull request #4036 from gyuho/storage_test_unsynced
storage: add more tests for synced, unsynced for watchable store
2015-12-21 20:30:43 -08:00
28d0e473a7 storage: add more tests for synced, unsynced for watchable store
This adds more tests on functions that updates synced and unsynced in watchable
store. Preparatory change for https://github.com/coreos/etcd/issues/3848.
2015-12-21 20:20:08 -08:00
f3f5726b8a Merge pull request #4037 from xiang90/proxy
etcdmain: fix incomplete proxy config file
2015-12-21 16:37:55 -08:00
4bcd7587e2 etcdmain: fix incomplete proxy config file
etcd might generate incomplete proxy config file after a power failure.
It is because we use ioutil.WriteFile. And iotuile.WriteFile does
not call Sync before closing the file.
2015-12-21 16:15:00 -08:00
d639e4f7f6 Merge pull request #4033 from heyitsanthony/raftexample-tests
Raftexample tests
2015-12-21 13:14:28 -08:00
chz
63bc804253 contrib/raftexample: shutdown rafthttp on closed proposal channel
Otherwise listening ports leak across unit tests and ports won't bind.
2015-12-21 13:03:42 -08:00
chz
b73a11ff45 contrib/raftexample: follow pipeline guidelines closer
close raft commit channel before issuing raft error since it's done
sending
2015-12-21 13:03:42 -08:00
chz
b7cf2385e6 contrib/raftexample: add test, fix dead lock on proposal channel
deadlock if no leader; node selects on propc=nil and writes to Ready,
client blocks on propC in same select as Ready reader, and so progress
of raft state machine deadlocks.
2015-12-21 13:03:35 -08:00
2681137fe0 Merge pull request #4020 from xiang90/ctl_04
etcdctl: support etcd0.4
2015-12-21 12:55:03 -08:00
541f2e5200 etcdctl: support basic operations with etcd 0.4.
For CoreOS users, they will get a updated version of etcdctl without updating
the etcd server version. And the users cannot really control this behavior.
We do not want to suddenly break them without enough communication.

So we still want the most basic opeartions like get, set, watch of etcdctl2 work
with etcd 0.4. This patches solve the incompability issue.
2015-12-21 11:59:13 -08:00
5587c4aa9a client: support reset Endpoints.
ResetEndpoints is useful when the there is a scheduled cluster
changes or when manually manage the cluster without auto-sync
enabled.
2015-12-21 11:59:13 -08:00
7e26fe9c16 Merge pull request #4032 from gyuho/one_mutex_for_storage
storage: use only one mutex for store struct
2015-12-21 11:58:44 -08:00
84d777305d storage: use only one mutex for store struct
Mutex is a variable, which means there needs to be only one mutex
value per scope. We don't need a separate mutex inside store struct,
**if we assume that `TxnPut` and `TxnRange` are called ONLY ONCE
per transaction (between `TxnBegin` and `TxnEnd`)**, as documented.
2015-12-21 11:42:40 -08:00
2a351e62f1 Merge pull request #4024 from heyitsanthony/add-command-argusage
etcdctl: member add command argusage to help
2015-12-21 10:59:07 -08:00
c597d591b5 scripts: rewrote build-aci to use acbuild 2015-12-21 10:15:08 -08:00
2974c4ec27 etcdctl: fill out ArgsUsage fields for help
USAGE in help now names positional arguments (e.g., "member remove <memberID>"
instead of "member remove [arguments...]")

Fixes #4021
2015-12-21 09:05:37 -08:00
c4732eb6e1 Merge pull request #4028 from gyuho/storage_test_variable
storage: remove unnecessary test variable assignment
2015-12-20 15:33:52 -08:00
2377ef870a storage: remove unnecessary test variable assignment
There is no need to assign a separate variable since 'base' is already defined
as a local variable within the loop.
2015-12-20 14:23:27 -08:00
98c1745278 Merge pull request #4026 from jonboulle/master
tests: update + enable check for leaked goroutines
2015-12-20 20:31:15 +01:00
b126ff77fb tests: only check for go1.5+ once 2015-12-20 19:51:53 +01:00
d50fbe384a tests: ignore leaked readLoop on go <1.5 2015-12-20 19:51:06 +01:00
e1fe7350a2 tests: update + enable check for leaked goroutines
Go 1.4 landed a new testing.M type [1][1] which allows for start-up and
shutdown hooks when running tests. The standard library now uses this
for checking for leaked goroutines in net/http [2][2].

This patch essentially re-ports the updated code from the net/http test
(we were using an older version of it) - in detail:
- updates the test to use `TestMain` instead of relying on
  `TestGoroutinesRunning` to be implicitly run after all other tests
- adds a few new goroutines to the list of exceptions (the test itself,
  as well as the golang/glog package and pkg/log.MergeLogger, both of
  which spin off goroutines to handle log flushing/merging respectively)
- removes a couple of TODOs in the test for extra goroutines that's run
  after individual tests (one of these re-enables the http package's
  `.readLoop` and the other was an out-of-date TODO)
- re-enables the test

[1]: https://golang.org/pkg/testing/#M
[2]: https://github.com/golang/go/blob/release-branch.go1.4/src/net/http/main_test.go#L18
2015-12-20 19:51:06 +01:00
dd76ee5801 Merge pull request #4027 from jcderr/patch-1
Radius Intelligence Use Case
2015-12-20 10:48:04 -08:00
a5f9403f8f Update production-users.md
Adding use case for Radius Intelligence.
2015-12-20 10:12:48 -08:00
2e736b51cc Merge pull request #4025 from jonboulle/sjpotter-api-nits
docs/api: add version & health, split out members
2015-12-19 17:42:32 +01:00
c18549261f docs: split out members API into own document
Also changes a few references to settle consistently on "members API"
instead of "member API" or "member APIs".
2015-12-19 15:57:15 +01:00
899dc4a6a6 docs: move /version to other_apis and add /health
moved /version http endpoint to other_api document and added health with
an example
2015-12-19 15:57:01 +01:00
cfd030fabf Merge pull request #4014 from ppalucki/functional
tools: add functional tester with Docker
2015-12-19 15:17:13 +01:00
9ee9e552e6 Godep: update codegangsta/cli dependency
Update codegangsta/cli dependency to the newer version for ArgsUsage
2015-12-18 17:48:04 -08:00
4f5f999847 tools/functional-test: add docker support
Commit adds docker bits to run functional tester within containers.

requires:
- docker 1.9 (networking)
- docker-compose
2015-12-18 15:56:18 +01:00
bff857dc28 Merge pull request #4011 from heyitsanthony/raftexample
contrib: example key-value store using raft
2015-12-17 15:05:59 -08:00
1f858e10c8 contrib: example key-value store using raft 2015-12-17 14:41:37 -08:00
afee51eeda Merge pull request #4013 from mickep76/update-libraries-and-tools
libraries-and-tools: name change for etcd-export to etcdtool
2015-12-17 14:34:36 +01:00
e4e2a7933f libraries-and-tools: name change for etcd-export to etcdtool 2015-12-17 13:52:14 +01:00
9b0b15c9be Merge pull request #4006 from mitake/kvtest-deadlock
storage, test: unlock transaction in the retry loop
2015-12-16 18:06:14 -08:00
af2569c2d8 storage, test: unlock transaction in the retry loop 2015-12-17 10:35:03 +09:00
ea50389060 Merge pull request #4010 from gyuho/window_compile
storage/backend: fixes Windows compile error
2015-12-16 11:12:39 -08:00
928c0d3601 storage/backend: fixes Windows compile error
The type is "Options," not "Option" -- at least, in the vendored version of boltdb in the repository.
2015-12-16 11:10:46 -08:00
a907ca5e62 Merge pull request #4004 from mitake/go-vet-fix
raft: remove go vet compliants
2015-12-15 20:37:08 -08:00
9b2da76796 raft: remove go vet compliants 2015-12-16 13:29:23 +09:00
ee02b50c05 Merge pull request #4000 from xiang90/production_users
README: link to production users
2015-12-15 15:56:44 -08:00
7b90099526 README: link to production users 2015-12-15 15:54:59 -08:00
923d5cec5e production-users.md: add cycoresys.com 2015-12-15 15:50:45 -08:00
0c6d5c7c03 Merge pull request #3997 from stevenschlansker/opentable-prod
Add OpenTable to production-users.md
2015-12-15 15:47:40 -08:00
73c8e3f0ab Add OpenTable to production-users.md 2015-12-15 12:38:12 -08:00
1c559bdb33 store: refactor
use defer statement to update `Stats` and report R/W Sucess/Failure, so
that the logic of Store's CURD operation and `Stats` update logic can be
separated.
2015-12-15 16:50:02 +08:00
6f59a15e55 Merge pull request #3992 from xiang90/fix_rafthttp_test
rafhttp: make TestStreamWriterAttachOutgoingConn more robust
2015-12-14 21:00:33 -08:00
80541d74d0 rafhttp: make TestStreamWriterAttachOutgoingConn more robust 2015-12-14 17:06:38 -08:00
96731f4525 Merge pull request #3991 from xiang90/fix_lock
pkg/fileutil: make TestLockAndUnlock less flaky on CI
2015-12-14 11:58:14 -08:00
be3e87b238 pkg/fileutil: make TestLockAndUnlock less flaky on CI
CI is slow sometime. To make the test less flaky, we increases the
timeout. This does not affect the correctness, but the test might
take longer to finish or to fail.
2015-12-14 11:38:07 -08:00
607ec83f81 Merge pull request #3988 from philips/add-production-users-doc
Documentation: add production users doc
2015-12-14 07:41:29 -08:00
e39ad84440 Documentation: add production users doc
Add the first production user: discovery.etcd.io. We should encourage
the commnuity to start adding their stories to this doc.
2015-12-14 00:00:23 -08:00
0eb46eb145 Merge pull request #3987 from gyuho/storage_test
storage: newFakeStore to return only 1
2015-12-13 17:58:24 -08:00
891cdd56b4 storage: newFakeStore to return only 1
This changes newFakeStore function to return only 1 because two other arguments
are already in the first return storage struct.
2015-12-13 17:46:47 -08:00
0225644d04 Merge pull request #3981 from gyuho/govet
*: fix shadowed variables
2015-12-13 00:00:12 -08:00
d55ab7798b storage: change var names from 'index'
Having the same variable name as 'index' type
can be confusing and shadowing other variables.
This gives different variable names to 'index' variables.
2015-12-12 16:37:24 -08:00
8696a1509c raft/rafttest: fix shadowed variable 2015-12-12 09:38:26 -08:00
4e06510280 pkg/fileutil: fix shadowed variables 2015-12-12 09:38:26 -08:00
c48b0a5e18 etcdmain: fix shadowed variables
Fix for https://github.com/coreos/etcd/issues/3954.
2015-12-12 09:38:26 -08:00
a4de207d53 wal: fix shadowed variables
Fixes for https://github.com/coreos/etcd/issues/3954.
2015-12-12 09:38:26 -08:00
a45b902d12 store: fixes shadowed variables with go tool vet.
Fixes for https://github.com/coreos/etcd/issues/3954.
2015-12-12 09:38:26 -08:00
4289871cef tools/function-tester: fix shadowed variables
Fixes for https://github.com/coreos/etcd/issues/3954.
2015-12-12 09:38:20 -08:00
52d21331e2 storage: fixes shadowed variables
Fixes for https://github.com/coreos/etcd/issues/3954.
2015-12-12 09:20:34 -08:00
0ff822bf22 etcdserver/auth: fix shadowed variables from go tool
Fixes for https://github.com/coreos/etcd/issues/3954.
2015-12-12 09:20:26 -08:00
40b11038f2 etcdserver: fixes shadowed variables for go tool vet
Fix for https://github.com/coreos/etcd/issues/3954.
2015-12-12 09:13:12 -08:00
bccc602c8f etcdctl/command: fix shawdowed error variable
This fixes https://github.com/coreos/etcd/issues/3954.
2015-12-12 09:13:12 -08:00
88aec09ddf client: fixes for govet -shadow
This fixes for shadowed error variables found by go tip go tool vet.
Fixes for https://github.com/coreos/etcd/issues/3954.
2015-12-12 09:13:07 -08:00
68a3962f64 Merge pull request #3980 from jonboulle/master
raft: update RecentActive name in comments
2015-12-11 15:11:40 -08:00
af9f352fe3 raft: update RecentActive name in comments
Noticed when retrospectively reviewing #3976 that a couple of places
were missed when the variable was renamed.
2015-12-11 15:06:11 -08:00
8eaa8f9e0b Merge pull request #3978 from xiang90/rh
rafthttp: snapshot sender cleanup
2015-12-11 13:53:02 -08:00
7d78e0c85e rafthttp: remove the unncessary TODO
The issue is not caused by this code, but by reading snapshot
from disk. etcd assumes the snapshot of v3 kv should live in
memory. If not, etcd does not work well anyway.
2015-12-11 09:37:41 -08:00
95c29838e3 rafthttp: move ReadCloser to ioutil 2015-12-11 09:37:41 -08:00
460873689e Merge pull request #3977 from xiang90/fix_todo
etcdsever: swap kv pointer atomically
2015-12-11 09:30:50 -08:00
2fc3320e59 rafthttp: kill the receiving body timeout TODO in snapshot sender 2015-12-10 22:42:17 -08:00
c4cbaf5c2a etcdsever: swap kv pointer atomically 2015-12-10 17:29:36 -08:00
791c5344b1 Merge pull request #3976 from xiang90/snap_fix
Only send snapshot when member is online
2015-12-10 17:01:50 -08:00
cc6d98bf89 etcdserver: only send snapshot when the member is active 2015-12-10 16:15:26 -08:00
9df46f9d6f raft: expose RecentActive in Progress 2015-12-10 12:17:18 -08:00
9b26753dbf Merge pull request #3970 from xiang90/snapshot
*: rewrite snapshot merging and sending for v3
2015-12-10 09:55:42 -08:00
fbcbfc49b8 Merge pull request #3975 from Masterlvng/fix-readme
README: corret Endpoints param
2015-12-10 07:53:20 -08:00
833c934bd4 Merge pull request #3972 from xiang90/fix_race
store: fix data race when modify event in watchHub.
2015-12-10 07:47:49 -08:00
938333a9fe client/README: correct Endpoints param 2015-12-09 20:49:52 -08:00
b3d9196021 Merge pull request #3973 from xiang90/purge_test
pkg/fileutil: make purge test more reliable
2015-12-09 11:01:47 -08:00
e93c07ba91 pkg/fileutil: make purge test more reliable 2015-12-09 10:34:38 -08:00
d7a027e476 store: fix data race when modify event in watchHub.
The event got from watchHub should be considered as readonly.
To modify it, we first need to get a clone of it or there might
be a data race.
2015-12-09 10:11:51 -08:00
23bd60ccce *: rewrite snapshot sending 2015-12-08 18:21:21 -08:00
9220a47c30 Merge pull request #3968 from sjpotter/fix-build
remove bash'ism
2015-12-07 21:17:01 -08:00
2c2fe7ee8e build: remove bash'ism
[[ is not supported by posix bourne shell so use a substring match that is
2015-12-07 21:05:55 -08:00
977d3615f7 Merge pull request #3952 from BramGruneir/snapshot
Ensure that Progress is not nil when a MsgSnapStatus comes in
2015-12-07 13:17:46 -08:00
1901a4c718 raft: Ensure that Progress is not nil when a MsgSnapStatus comes in.
This was causing some issues in cockroach cockroachdb/cockroach#2950
2015-12-07 16:01:18 -05:00
5a4a5998a3 Merge pull request #3960 from gyuho/genproto
storagepb: minor updates from genproto
2015-12-04 16:18:52 -08:00
bed6b90e61 storagepb: minor updates from genproto
I ran genproto with the most recent protocol buffer
and it adds one line of missing comment.
2015-12-04 16:16:54 -08:00
b806e510a9 Merge pull request #3958 from gyuho/build_gotip
build: parse go version at go tip
2015-12-04 13:18:07 -08:00
0473cb93ac build: parse go version at go tip
This parses go version when build is running
in moast recent go master branch.
2015-12-04 13:08:19 -08:00
127b529582 Merge pull request #3959 from xiang90/fix_storage_test
storage: make RestoreContinueUnfinishedCompaction more reliable
2015-12-04 13:03:29 -08:00
a0eca5fd37 storage: make RestoreContinueUnfinishedCompaction more reliable 2015-12-04 11:48:46 -08:00
e0e80b37f2 Merge pull request #3957 from gyuho/travis_matrix
travis, test: allow failures with gotip
2015-12-04 11:33:35 -08:00
bbdd3c5f0e travis, test: allow failures with gotip
This allows tests to fail with Go tip, because go tip branch might not be
stable. This replaces https://github.com/coreos/etcd/pull/3953.
2015-12-04 11:16:59 -08:00
769f874542 Merge pull request #3956 from mitake/test-gofmt
test: print diff with gofmt
2015-12-03 17:22:14 -08:00
7094c78dcd test: print diff with gofmt
Current test doesn't pass -d option to gofmt. Let's pass the option
for easy fix.

Example output:
Checking gofmt...
gofmt checking failed:
client/client.go
diff client/client.go gofmt/client/client.go
--- /tmp/gofmt741496847 2015-12-04 10:11:11.340651702 +0900
+++ /tmp/gofmt265273890 2015-12-04 10:11:11.340651702 +0900
@@ -41,7 +41,7 @@

 var DefaultTransport CancelableTransport = &http.Transport{
        Proxy: http.ProxyFromEnvironment,
-       Dial:(&net.Dialer{
+       Dial: (&net.Dialer{
                Timeout:   30 * time.Second,
                KeepAlive: 30 * time.Second,
        }).Dial,
@@ -265,7 +265,7 @@
                        return ErrNoEndpoints
                }

-               for i, _ := range c.endpoints {
+               for i := range c.endpoints {
                        if c.endpoints[i].String() == lu.String() {
                                c.pinned = i
                                break
2015-12-04 10:12:05 +09:00
8d4073d078 Merge pull request #3945 from xiang90/new_watch_bench
tools/benchmark: add watch subcommand.
2015-12-03 16:56:02 -08:00
c6430b3292 tools/benchmark: add watch subcommand.
Watch command run benchmark tests for watch releated operations:
1. watch keys 2. sending events to watchers.

To test 2, this test also puts keys into etcd cluster to trigger
event sending.

Besides the benchmark results showed at client side, the tester
can also monitor the server-side mem/cpu usage and also check
the metrics of slow watchers. If there are a lot of slow watchers,
it means etcd server is over-loaded by watchers.
2015-12-03 15:56:17 -08:00
7f534fd503 Merge pull request #3951 from gyuho/travis
travis: add Go tip for testing
2015-12-03 14:39:51 -08:00
8e2c430f52 travis: add Go tip for testing
It would be good to test etcd with the latest Go branch,
so that we can find bugs either in Go or etcd in advance.

Reference:
https://docs.travis-ci.com/user/languages/go
2015-12-03 14:36:35 -08:00
d817f885db raft: doc, debugging instruction on MessageType
This adds documentation on MessageType. Having clear explanation about
MessageType helps understand raft logic and debug etcd when there is a
message dropping. This is partially for coreos#3806.
2015-12-03 00:45:11 -08:00
4e5f20de3b Merge pull request #3948 from xiang90/refactor
etcdserver: refactor a for loop in recvSnap test
2015-12-02 15:45:19 -08:00
0708a5e50d etcdserver: refactor a for loop in recvSnap test 2015-12-02 15:41:03 -08:00
2696d6960b Merge pull request #3947 from xiang90/fix_purge
pkg/fileutil: make purgeTest more robust
2015-12-02 15:18:11 -08:00
77069ca16b pkg/fileutil: make purgeTest more robust
1. Add a few comments to the test.

2. Provide a more robust way to check the purging
result. Tolerate slow io operations.
2015-12-02 15:12:42 -08:00
67ffeee521 Merge pull request #3946 from xiang90/fix_snap_test
etcdserver: get rid of unreliable WaitSchedule
2015-12-02 15:01:20 -08:00
4dd502dbb3 Merge pull request #3940 from gyuho/add_map_test
storage: add tests for unsafeAddWatching
2015-12-02 14:19:58 -08:00
ff2e8b55ae storage: add tests for unsafeAddWatching
This adds map operation tests for unsafeAddWatching, which
could have been failed without https://github.com/coreos/etcd/pull/3939.
It tests if unsafeAddWatching is correctly updating synced map.
2015-12-02 14:18:07 -08:00
3ec3ffbef0 etcdserver: get rid of unreliable WaitSchedule
In this case, we know we are waiting for an action happened on
storage. We can do a busy wait instead of calling waitSchedule.

The test previously failed on CI with no observed actions.
2015-12-02 13:18:11 -08:00
dd733ca51d Merge pull request #3942 from ngaut/master
v3rpc: Tiny clean up
2015-12-01 21:12:03 -08:00
e142e073e8 v3rpc: Tiny clean up
unreachable code
2015-12-02 12:30:21 +08:00
14210cf8a7 Merge pull request #3939 from xiang90/fix_watch
storage: add missing return for unsafeAddWatching
2015-12-01 15:01:30 -08:00
9f908ce67f storage: add missing return for unsafeAddWatching 2015-12-01 14:55:20 -08:00
99e80ef60f Merge pull request #3937 from gyuho/proxy_typo
Documentation: minor typo in proxy.md
2015-12-01 09:55:36 -08:00
93240bd0f0 Merge pull request #3933 from ngaut/fix-shadow-variables
tools: Fix shadow variables and incorrect printing format.
2015-12-01 08:55:22 -08:00
7118363b20 Documentation: minor typo in proxy.md
This fixes a minor typo in proxy.md.
2015-12-01 08:38:37 -08:00
858857d701 Merge pull request #3936 from joshix/proxydoc-jx
Documentation/proxy: Explain more about Procfile sample.
2015-11-30 16:29:51 -08:00
f30ec0bfea Documentation/proxy: Explain more about Procfile sample.
Say what goreman is so it doesn't confuse.
2015-11-30 16:19:20 -08:00
d8798a0724 Merge pull request #3871 from gyuho/proxy_doc_improve_20151115
Documentation: clarify flag usage for proxy
2015-11-30 16:11:57 -08:00
27831f022d Merge pull request #3934 from xiang90/new_benchmark_tool
tools: rewrite benchmark tool
2015-11-30 14:32:59 -08:00
367b7325c8 Documentation: clarify flag usage for proxy
This is for coreos#3740. Some users find
`advertise-client-urls` and `initial-cluster` flags confusing to set up a
proxy. This clarifies why we need both flags. And add summary to proxy.md
2015-11-30 14:07:06 -08:00
faff00d19e tools: rewrite benchmark tool 2015-11-30 11:52:43 -08:00
232439191c tools: Fix shadow variables and incorrect printing format. 2015-11-30 20:13:29 +08:00
d705df047b Merge pull request #3922 from gyuho/etcdctlv3_with_cobra
etcdctlv3: use spf13/cobra for cli interface
2015-11-27 14:35:40 -08:00
e8eb3d7744 Merge pull request #3927 from nordligulv/patch-1
client: fix goroutine leak in unreleased context
2015-11-27 10:38:40 -08:00
cb9a3e04b1 client: fix goroutine leak in unreleased context
If headerTimeout is not zero then two context are created but only one is released.
cancelCtx in this case is never released which leads to goroutine leak inside it.
2015-11-27 19:44:38 +03:00
b7647e0b55 etcdctlv3: use spf13/cobra for cli interface
This replaces codegansta/cli with spf13/cobra base on
this guideline: https://github.com/coreos/docs/blob/master/golang/README.md#cli.
2015-11-26 08:01:09 -08:00
a423a55b14 Merge pull request #3857 from es-chow/remove-multinode-goroutine
raft: add an thread-unsafe Node: RawNode
2015-11-26 07:52:08 -08:00
5bc56786dc raft: add RawNode which is a thread-unsafe node without goroutine and remove MultiNode 2015-11-26 17:14:14 +08:00
a6bb74e9ff Godeps: add spf13/cobra for etcdctlv3 2015-11-25 14:05:33 -08:00
27a243c2aa Merge pull request #3923 from gyuho/duplicate_exit
etcdctl: remove duplicate exit line
2015-11-25 09:45:02 -08:00
e8215cc577 etcdctl: remove duplicate exit line
`handleError` already exits with the exit code in arguments.
`os.Exit(1)` is never executed in this case.
2015-11-25 09:41:27 -08:00
0777cda4ea Merge pull request #3921 from barakmich/fix_issue_3920
etcdserver: Fix panic for v3 transaction compares on non-existent keys
2015-11-24 18:35:23 -08:00
452e5bffc0 etcdserver: Fix panic for v3 transaction compares on non-existent keys
Fixes #3920
2015-11-24 16:49:45 -05:00
9d7be9ec6f Merge pull request #3917 from xiang90/raft_stepdown
raft: support quorum check when raft is leader
2015-11-24 10:15:01 -08:00
a8cc1570d0 raft: support quorum check when raft is leader
If quorum check fails, the leader will step down to follower.
2015-11-24 09:36:37 -08:00
2762e1f3f7 Merge pull request #3918 from gyuho/V3Procfile
V3DemoProcfile: use double dash
2015-11-23 23:32:33 -08:00
7b90faafad V3DemoProcfile: use double dash
This just makes the V3DemoProcfile flags consistent with the other Procfile.
(Makes the single dash to double dash)

Related to https://github.com/coreos/etcd/pull/3911.
2015-11-23 23:27:30 -08:00
31ac6ad448 Merge pull request #3915 from philips/add-bdarnell
MAINTAINERS: add Ben Darnell
2015-11-23 21:03:22 -08:00
1a3092b954 MAINTAINERS: add Ben Darnell
Add Ben Darnell as a maintainer because of his ongoing and significant contributions to the raft package in this repo.
2015-11-23 17:22:43 -08:00
b31b946a08 Merge pull request #3913 from xiang90/update_doc
doc: talk about update client urls in reconf doc
2015-11-23 15:32:32 -08:00
c60933ebd9 doc: talk about update client urls in reconf doc 2015-11-23 15:32:11 -08:00
6df57cdcfa Merge pull request #3911 from gyuho/proc_double_dash
Procfile: change it to double-dash in Procfile
2015-11-23 13:05:27 -08:00
b83b0af41f Procfile: change it to double-dash in Procfile
This makes flags to etcdmain consistent with man page in help.go. The standard
Go flag package supports both '-' and '--', different than most of CLI
packages, but etcdctl with codegansta/cli uses double-dash('--'), which
sometimes gets confusing, especially to new users.
2015-11-23 12:58:21 -08:00
d435d443bb Merge pull request #3895 from yichengq/storage-watchid
storage: add watch ID to identify watchings
2015-11-22 22:15:59 -08:00
deb1da5f57 storage: add watch ID to identify watchings
One watcher includes multiple watchings, and their events are
sent out through one channel. For the received event, user would like to
know which watching it belongs to.

Introduce a watch ID. When watching on some key, user will get a watch
ID. The watch ID is attached to all events that is observed by this
watch.
2015-11-21 11:19:17 -08:00
2de9a5bbd0 Merge pull request #3899 from gyuho/3859_synced_map
storage: use map for watchableStore synced
2015-11-21 08:06:00 -08:00
48aebd9b09 storage: use map for watchableStore synced
This is for coreos#3859 switching slice to map for synced watchings.
For a large amount of synced watchings, map implementation performs better.
When putting 1 million watchers on the same key and canceling them one by
one: original implementation takes 9m7.268221091s, while the one with map
takes only 430.531637ms.
2015-11-21 00:42:09 -08:00
31574cb9da Merge pull request #3905 from xiang90/etcdctl_health
etcdctl: cluster health exit with non-zero when cluster is unhealthy
2015-11-20 14:38:18 -08:00
37b34b334b Merge pull request #3887 from ypu/flags
etcdmain: Add max-snapshots and max-wals to help
2015-11-20 14:20:57 -08:00
a211ba3361 Merge pull request #3890 from gyuho/pipeline_full_message
rafthttp: more detailed logs when filled-up buffer
2015-11-20 14:09:34 -08:00
e39206e084 etcdctl: cluster health exit with non-zero when cluster is unhealthy 2015-11-20 13:51:31 -08:00
82b83cad43 Merge pull request #3904 from coreos/jonboulle-patch-2
MAINTAINERS: remove blake, fix xiang's handle
2015-11-20 13:30:46 -08:00
6fae5f5a0a MAINTAINERS: remove blake, fix xiang's handle 2015-11-20 13:26:45 -08:00
11d00c7919 Merge pull request #3897 from xiang90/fix_watch
v3rpc: do not send closing event
2015-11-20 09:40:46 -08:00
b104df4b17 Merge pull request #3898 from mitake/get-quorum
etcdctl: a new option for quorum get
2015-11-20 08:08:49 -08:00
b868f4b1b1 v3rpc: report eventReceived correctly 2015-11-19 22:44:46 -08:00
11f49a0960 etcdctl: a new option for quorum get
Current etcdctl seems to lack an option for specifying quorum flag for
GET. This commit adds the option.
2015-11-20 14:09:50 +09:00
3cf90a4dff v3rpc: do not send closing event
When a watch stream closes, both of the watcher.Chan and closec
will be closed.

If watcher.Chan is closed, we should not send out the empty event.
Sending the empty is wrong and waste a lot of CPU resources.
Instead we should just return.
2015-11-19 17:56:15 -08:00
c400d05d0a Merge pull request #3892 from xiang90/fix_snapshot_handling
etcdserver: handle incoming v3 snapshot correctly
2015-11-19 12:18:18 -08:00
cf2d20c5c9 Merge pull request #3851 from yichengq/storage-kv-data
storage: save the KeyValue instead of Event in backend
2015-11-19 10:24:50 -08:00
1214f77519 storage: set revBytes capacity to avoid malloc when appending mark
This is a performance optimization.
2015-11-19 09:53:31 -08:00
f62e0ced9b Merge pull request #3893 from coreos/jonboulle-patch-1
docs/benchmarks: small fix in memory benchmark
2015-11-18 21:02:52 -08:00
ddc4f8bd45 etcdmain: Add max-snapshots and max-wals to help
Based on the configuration doc, seems these two flags are missing
in the help. So add them and the descriptions are from config.go in
the same directory.

Signed-off-by: Yiqiao Pu <ypu@redhat.com>
2015-11-19 11:58:00 +08:00
6f3ce70b46 docs/benchmarks: small fix in memory benchmark 2015-11-18 19:20:18 -08:00
a07e4bb6e2 etcdserver: handle incoming v3 snapshot correctly
1. we should update all kv reference (including the
on in snapStore).

2. we should first restore a new KV and then close
the old one asynchronously.
2015-11-18 16:07:41 -08:00
6aa559f93d rafthttp: more detailed logs when filled-up buffer
This adds more detailed explanation about why some messages got dropped to help
users debug. This is for https://github.com/coreos/etcd/issues/3806.
2015-11-18 14:17:43 -08:00
33065739df Merge pull request #3782 from gyuho/doc.go_for_package_description
*: add missing package descriptions
2015-11-18 11:32:37 -08:00
81229dbea9 *: add missing package descriptions
This adds and updates package descriptions in etcd projects.
And also deletes some duplicate LICENSE statements.
2015-11-17 20:54:10 -08:00
2f74f76025 storage: remove the event concept from key-value layer
The point is to decouple the key-value storage layer and the
event notification layer clearly. It gives the watchableKV the
flexibility to define whatever event structure it wants without
breaking the ondisk format at key-value storage layer.

Changes:

1. change the format of key and value stored in backend

Store KeyValue struct instead of Event struct in backend value for
better abstraction as xiang suggests. And record the corresponded
action in the backend key.

2. Remove word 'event' from functions
2015-11-17 20:35:49 -08:00
c4672ced3e Merge pull request #3885 from joshix/prodready
README: Outright production-readiness
2015-11-17 18:18:16 -08:00
833f0388cd README: Outright production-readiness
State outright that etcd is used in production and ready for more of same.

Supersedes #3884.

Adopt #3884 in spirit, but directly in README as jonboulle suggested.

Delete Documentation/production-ready.md.
2015-11-17 18:05:31 -08:00
14c8a1c400 Merge pull request #3883 from gyuho/raft_typo
raft: minor typo in progress.go
2015-11-17 14:22:35 -08:00
e1c108e604 raft: minor typo in progress.go
Fixes a minor typo.
2015-11-17 14:21:35 -08:00
db4dcbf827 Merge pull request #3881 from xiang90/godep
Godeps: add LICENSE file for ugorji/go
2015-11-17 13:38:06 -08:00
0f339aaf2d godep: update godep 2015-11-17 13:18:24 -08:00
c0353697da Merge pull request #3880 from xiang90/fix_stop
etcdmain: fix unstoppable startEtcd function
2015-11-17 09:18:27 -08:00
9e4a003fb0 etcdmain: fix unstoppable startEtcd function
We should wrap the blocking function with a closure. And first
creates a go routine to execute the function. Or the inner function
blocks before creating the go routine.
2015-11-17 09:04:00 -08:00
8af76115cb Merge pull request #3873 from yichengq/func-long-timeout
tools/etcd-tester: extend timeout for stresser
2015-11-17 07:59:41 -08:00
08dc70cc0f Merge pull request #3875 from xiang90/fix_txn
etcdserver: start real txn for txn request
2015-11-16 21:16:20 -08:00
b4abe5b584 etcdserver: start real txn for txn request
We should open real txn for applying txn requests. Or the intermediate
state might be observed by reader.

This also fixes #3803. Same consistent(raft) index per multiple indenpendent
operations confuses consistentStore.
2015-11-16 21:11:27 -08:00
5d0268aa2e Merge pull request #3877 from bdarnell/campaign-while-leader
raft: no-op instead of panic for Campaigning while leader
2015-11-16 19:59:34 -08:00
fbeb58d265 raft: no-op instead of panic for Campaigning while leader
We need to be able to force an election (on one node) after creating a
new group (cockroachdb/cockroach#1384), but it is difficult to ensure
that our call to Campaign does not race with an election that may be
started by raft itself. A redundant call to Campaign should be a no-op
instead of a panic. (But the panic in becomeCandidate remains, because
we don't want to update the term or change the committed index in this
case)
2015-11-16 21:44:14 -05:00
5fa875fb5d Merge pull request #3876 from jonboulle/master
scripts: clean up genproto
2015-11-16 16:43:11 -08:00
dd0932a78d scripts: clean up genproto
Rather than copying in .proto files, use the same symlink
trick we do for doing the actual etcd build.

Also check for exact version of protoc early on.
2015-11-16 15:26:22 -08:00
32460538ba Merge pull request #3862 from xiang90/watch_bench_doc
doc: add etcd3 watch benchmark doc
2015-11-16 12:14:19 -08:00
dfc7cc7a62 tools/etcd-tester: extend timeout for stresser
Extend the timeout from 1s to defaultRequestTimeout 5s.

The 1s may bring unwanted burden to the target member. If the member is
busy at recovering, it has limited bandwidth for client requests. A
short timeout at client side will retry quickly while keeping the
on-going connections. Thus, etcd will queue lots of requests and
connections and takes long time to clear them. This finally causes the
timeout of member health check.

This problem is a general one that how etcd handles amounts of requests
at the same time in a good way. We don't plan to address it at current
stage.
2015-11-16 11:47:08 -08:00
6c0f14cefc Merge pull request #3870 from yichengq/fix-raft-log
raft: fix print format for term in one log line
2015-11-15 20:31:18 -08:00
3a65442d7d raft: fix print format for term in one log line
`term` should be printed in decimal representation instead of
hexadecimal one.
2015-11-15 20:26:16 -08:00
fdd87cd899 Merge pull request #3868 from xiang90/fix_auth_guest
auth: use canonical path for pre-defined guest role
2015-11-15 19:17:16 -08:00
a1616afc5d auth: use canonical path for pre-defined guest role 2015-11-15 17:58:09 -08:00
6bdc81d979 Merge pull request #3865 from gyuho/map_populate_for_unix
storage/backend: support MAP_POPULATE for unix
2015-11-15 16:53:32 -08:00
7638423f85 storage/backend: support MAP_POPULATE for unix
This adds build constraints in order to pass memory-map flags to bolt.Option.
If backend package passes syscall.MAP_POPULATE flag, the boltdb does read-ahead
which speeds up entire-database read, which then leads to faster storage
Restore. Benchmark result shows for 4GB, it opens and loads 6x faster in SSD,
and 12x faster in HDD. For 2GB, 1.6x faster with MAP_POPULATE in SSD.
2015-11-15 16:47:38 -08:00
37d3d20211 doc: add etcd3 watch benchmark doc
The primary goal of this doc is to confirm the memory
consumption of watch is as expected. Each connection
consumes O(10kb) of memory. Each stream consumes O(10kb)
of memory. Each watching consumes < O(1kb) of memory.

Then when you have a large number of watching with small
number of connections and streams, the ave memory consumption
per watch will be O(1kb).
2015-11-15 16:38:09 -08:00
185097ffaa Merge pull request #3860 from gyuho/typo_in_wal
wal: minor typo in wal pkg
2015-11-12 16:14:16 -08:00
55adfcb428 wal: minor typo in wal pkg
Fixes a minor typo in wal.go.
Thanks!
2015-11-12 15:23:12 -08:00
5acf5579b7 Merge pull request #3858 from gyuho/godep_bolt_20151112
Godeps: update boltdb/bolt for MAP_POPULATE
2015-11-12 13:17:26 -08:00
b4e0dbe721 Godeps: update boltdb/bolt for MAP_POPULATE
This is for https://github.com/boltdb/bolt/pull/455.
2015-11-12 12:25:37 -08:00
de02254659 Merge pull request #3853 from xiang90/lease_new
lease: delete items when the lease is revoked.
2015-11-11 22:13:37 -08:00
bf3bc0ed6b lease: delete items when the lease is revoked.
Add minimum KV interface and implmement the deleting mechanism for
revoking lease.
2015-11-11 13:21:12 -08:00
2990249c1d Merge pull request #3856 from xiang90/raft_doc_restart
raft: add doc to make restart clear
2015-11-11 11:15:49 -08:00
f7f28b9984 raft: add doc to make restart clear, especially for configuration changed case 2015-11-11 11:11:58 -08:00
0aa4ce0606 Merge pull request #3855 from xiang90/raft_doc
raft: add more words about raft protocol
2015-11-11 09:42:32 -08:00
6df52614fc raft: add more words about raft protocol 2015-11-11 09:20:25 -08:00
2d11e7464e Merge pull request #3849 from gyuho/etcdmain_typos_descriptions
etcdmain: minor typo, make descriptions consistent
2015-11-10 13:20:40 -08:00
12b4a122ce etcdmain: minor typo, make descriptions consistent
This fixes some typos and make help.go and config.go flag descriptions
consistent with each other.
2015-11-10 13:00:08 -08:00
3e74ff7ad4 Merge pull request #3834 from xiang90/lease_new
lease: initial lessor impl
2015-11-10 10:29:19 -08:00
ff36b9d9bc Merge pull request #3700 from xiang90/metrics_hi
Replace Summary with Histogram for all metrics
2015-11-10 10:06:45 -08:00
964f6050ee raft: use HistogramVec for message_sent_latency 2015-11-10 10:05:32 -08:00
434f39d813 Merge pull request #3847 from gyuho/doc_flag_issue3690
etcdmain: more description for init cluster token
2015-11-10 09:42:02 -08:00
5eb57c2aee etcdmain: more description for init cluster token
This adds more description to initial-cluster-token from
https://github.com/coreos/etcd/issues/3690 to help.go.
2015-11-10 09:40:08 -08:00
7ff8ec81ee Merge pull request #3771 from yichengq/cors-auth
pkg/cors: add authorization into Access-Control-Allow-Headers
2015-11-10 08:05:53 -08:00
ad55630aa8 Merge pull request #3844 from gyuho/docker_guide_etcd_version
Documentation: docker pull latest etcd from quay
2015-11-09 20:59:28 -08:00
b6afe94aee Documentation: docker pull latest etcd from quay
As pointed out at https://github.com/coreos/etcd/issues/3843, Docker guide uses
outdated version of etcd. This docker commands will pull from the latest
releases.
2015-11-09 19:52:41 -08:00
8ce4313b22 Merge pull request #3841 from joshix/toolexamplelink
Documentation/security: Fix links about tls keygen.
2015-11-09 16:36:48 -08:00
95ade2c0ac Documentation/security: Fix links about tls keygen.
Edit to replace a relative link (won't work with that target) with
an absolute link.
Heading 1 Title Case.
Polish graf 3.

Fixes https://github.com/coreos/docs/issues/662
2015-11-09 16:12:56 -08:00
1f965972fc Merge pull request #3839 from xiang90/rename
etcdserver: rename processInternalRaftReq to processInternalRaftRequest
2015-11-10 00:56:18 +01:00
ae2f69b41e etcdserver: rename processInternalRaftReq to processInternalRaftRequest
We have a structure called InternalRaftRequest. Making the function
shorter by calling it processInternalRaftReq seems to be random and
reduce the readability. So we just use the full name.
2015-11-09 13:37:36 -08:00
213e246f13 Merge pull request #3838 from mlahaye/typofix
etcdctl: fix typo in error message (Invaild to Invalid)
2015-11-09 13:20:21 -08:00
735440e52d etcdctl: fix typo in error message (Invaild to Invalid) 2015-11-09 15:10:35 -06:00
31601d494d Merge pull request #3819 from gyuho/shorten_interval_for_first_retrials
proxy: expedite proxy refresh given no endpoints
2015-11-09 15:45:26 +01:00
dcf0b45483 Merge pull request #3835 from gyuho/doc_typo_20151108
Doc: fix typos for v3benchmark
2015-11-08 19:40:43 -08:00
a27a6b1372 Doc: fix typos for v3benchmark
The official benchmakr code seems to be in `tools` directory (not in `hack`).
2015-11-08 19:38:49 -08:00
bf70b5127a lease: initial lessor impl 2015-11-08 11:28:50 -08:00
ca25ed3ad2 proxy: expedite proxy refresh given no endpoints
This fixes coreos#3647 by giving shorter proxy refresh interval whenever
there is no endpoints found. Deleted sleep command in Procfile and proxy
documentation accordingly.
2015-11-07 07:27:39 -08:00
7c9e92d1d5 Merge pull request #3830 from xiang90/bolt
Godeps: update boltdb
2015-11-06 13:04:38 -08:00
4669b899cc Godeps: update boltdb 2015-11-06 12:58:28 -08:00
3de5d478ef Merge pull request #3829 from jonboulle/master
godeps: bump coreos/pkg/capnslog
2015-11-06 18:49:08 +01:00
6f8356ba40 godeps: bump coreos/pkg/capnslog
Update to catch coreos/pkg#43 which should fix SYSLOG_IDENTIFIER getting
set when etcd is logging to the journal.
2015-11-06 18:37:06 +01:00
ab1d33b8bd *: bump to v2.3.0-alpha.0+git 2015-11-06 09:16:03 -08:00
a2d4f85d33 *: bump to v2.3.0-alpha.0 2015-11-06 09:08:39 -08:00
63872d812a Merge pull request #3825 from jonboulle/master
contrib: add example systemd unit file
2015-11-06 17:53:32 +01:00
652d3f1974 contrib: add example systemd unit file 2015-11-06 17:50:19 +01:00
e3ce605cb5 Merge pull request #3826 from jonboulle/scripts
scripts: enforce genproto.sh is run from repo root
2015-11-06 17:26:16 +01:00
de0cb472be scripts: enforce genproto.sh is run from repo root 2015-11-06 16:13:24 +01:00
f48e95f7b0 Merge pull request #3822 from mitake/strict-reconfig-error-log
etcdserver: correct error log for strict reconfig checking
2015-11-06 15:13:27 +01:00
2c8ffa6bcb etcdserver: correct error log for strict reconfig checking
This commit fixes an error log caused by the strict reconfig checking
option.

Before:
14:21:38 etcd2 | 2015-11-05 14:21:38.870356 E | etcdhttp: got unexpected response error (etcdserver: re-configuration failed due to not enough started members)

After:
log
13:27:33 etcd2 | 2015-11-05 13:27:33.089364 E | etcdhttp: etcdserver: re-configuration failed due to not enough started members

The error is not an unexpected thing therefore the old message is
incorrect.
2015-11-06 11:03:42 +09:00
9d880f136f Merge pull request #3818 from yichengq/req-snap-log
etcdserver: fix snapshot index in creation log line
2015-11-05 14:04:46 -08:00
0874c44cdc etcdserver: fix snapshot index in creation log line
The snapshot is created at appliedi instead of snapi.
2015-11-05 14:02:09 -08:00
dadfdf6af8 Merge pull request #3802 from yichengq/fix-storage-watch
storage: delete key instead of setting it to false
2015-11-05 11:40:46 -08:00
08f0d94019 Merge pull request #3809 from xiang90/rpc_kv
*: refactor kv rpc implementation
2015-11-04 19:05:48 -08:00
47cad59571 Merge pull request #3813 from yichengq/update-version
*: update clusterMinVersion and feature maps for incoming v2.3
2015-11-04 14:37:37 -08:00
ec3c2d23a3 *: update feature maps to adopt v2.3.0 2015-11-04 14:30:35 -08:00
b82c171f5f version: update MinClusterVersion to v2.2.0
This is the preparation for bumping to v2.3.0-alpha
2015-11-04 14:30:04 -08:00
03951495d3 Merge pull request #3811 from gyuho/storage_watchergauge_fix
storage: move watcherGauge to watchable_store
2015-11-04 13:23:10 -08:00
6e5eb03544 storage: move watcherGauge to watchable_store
watcherGauge should be increased everytime we creates Watcher, not per watch
method call.
2015-11-04 13:17:47 -08:00
319b77b051 Merge pull request #3810 from gyuho/storage_metrics_add_watcher_gauge
storage: add metrics to watcher
2015-11-04 13:09:07 -08:00
4ebf28aa2e storage: add metrics to watcher
This adds metrics to watcher, and changes some order in MustRegister function
calls in init (same order that we define the gauges).
2015-11-04 13:01:52 -08:00
33fe6f41fb Merge pull request #3808 from yichengq/fix-wait-test
pkg/wait: extend wait timeout in TestWaitTime
2015-11-04 11:39:24 -08:00
3d15526c35 Merge pull request #3796 from yichengq/fix-get-version
etcdserver: not reuse connections for peer transport
2015-11-04 11:39:14 -08:00
c37bd2385a *: refactor kv rpc implementation 2015-11-04 11:36:17 -08:00
3b8349c06e pkg/wait: extend wait timeout in TestWaitTime
Fix this error happening on travis:
```
--- FAIL: TestWaitTime-2 (0.01s)
		wait_time_test.go:46: cannot receive from ch as expected
```
2015-11-04 11:18:17 -08:00
4ccbcb91c8 rafthttp: add functions to create listener and roundTripper
This moves the code to create listener and roundTripper for raft communication
to the same place, and use explicit functions to build them. This prevents
possible development errors in the future.
2015-11-04 11:12:46 -08:00
32819f6b3f etcdserver: use roundTripper to request peerURL
It uses roundTripper instead of Transport because roundTripper is
sufficient for its requirements.
2015-11-04 10:49:42 -08:00
5272ee99b5 Merge pull request #3804 from xiang90/ctl_watch
etcdctlv3: support watch
2015-11-04 10:21:05 -08:00
616078dc1b Merge pull request #3807 from xiang90/kv
*: rename etcd service to kv service in gRPC
2015-11-04 10:11:02 -08:00
1a3f7f7fa4 *: rename etcd service to kv service in gRPC 2015-11-04 10:05:49 -08:00
65d153db73 Merge pull request #3783 from yichengq/merge-logger
rafthttp: use MergeLogger for rafthttp logging
2015-11-04 09:48:43 -08:00
6040d57106 rafthttp: use MergeLogger to merge message-drop log
rafthttp logs repeated messages when amounts of message-drop logs
happen, and it becomes log spamming.
Use MergeLogger to merge log lines in this case.
2015-11-04 07:26:58 -08:00
5329159b5e rafthttp: remove failureMap from peerStatus
The logging mechanism is verbose, so it is removed from peerStatus.

We would like to see the status change
of connection with peers, and one error that leads to deactivation.
There is no need to print out all non-repeated errors.
2015-11-04 07:26:33 -08:00
5c1b833232 etcdctlv3: support watch
A draft impl for demo.
2015-11-03 19:28:57 -08:00
c8e622f517 storage: make putm/delm a set with empty value
This cleans the code, and reduces the allocation space.
2015-11-03 19:10:45 -08:00
6dbfc21846 storage: delete key instead of setting it to false
When getting the watched events, it iterate all keys in putm and delm
to generate the events. If we don't delete the key from putm/delm,
it would range on the key that is not actually put or deleted. This is
incorrect.

Fix the panic that happens when single put/delete is watched.
2015-11-03 19:00:39 -08:00
94c6b6a93d Merge pull request #3801 from yichengq/fix-raft-timeout
raft: extend wait timeout in TestNodeAdvance
2015-11-03 18:29:47 -08:00
0de52414cd raft: extend wait timeout in TestNodeAdvance
This fixes the failure met in semaphore CI.
2015-11-03 16:57:18 -08:00
1f1d8e9282 Merge pull request #3800 from xiang90/watch_server
*: serve watch service
2015-11-03 16:32:29 -08:00
10de2e6dbe *: serve watch service
Implement watch service and hook it up
with grpc server in etcdmain.
2015-11-03 15:58:34 -08:00
70cb8b8391 Merge pull request #3799 from gyuho/nameing_in_metrics_watching
storage: apply same naming in metrics.go
2015-11-03 15:23:39 -08:00
bdc280c4a7 storage: apply same naming in metrics.go
This is PR following up with Xiang's https://github.com/coreos/etcd/pull/3795,
and to make the naming consistent with its interface change.
2015-11-03 15:19:18 -08:00
f6b097c0cc Merge pull request #3798 from xiang90/watch_new
*: add v3 watch service
2015-11-03 14:39:20 -08:00
c160085f44 *: add v3 watch service 2015-11-03 14:21:24 -08:00
154fc8e19c Merge pull request #3795 from xiang90/watch_stream
storage: add watchChan
2015-11-03 13:32:49 -08:00
a1129dd5a5 storage: support multiple watching per watcher
We want to support multiple watchings per one watcher chan. Then
we can have one single go routine to watch multiple keys/prefixs.
2015-11-03 12:36:11 -08:00
34e7611093 Merge pull request #3797 from gyuho/procfile_20151103
Procfile: delay proxy waiting for initial cluster
2015-11-03 10:24:48 -08:00
5e29449d61 Procfile: delay proxy waiting for initial cluster
This fixes #3647 by delaying proxy start by 3 seconds. Without this, proxy
starts at the same time as initial cluster and since the default proxy director
refresh interval is 30-second, if cluster is not ready at first trial, the
proxy misses to discover them and has to wait another 30-seconds, which delays
the proxying for first 30-second.
2015-11-03 10:22:48 -08:00
0eee88a3d9 etcdserver: use timeout transport as peer transport
This pairs with remote timeout listeners.

etcd uses timeout listener, and times out the accepted connections
if there is no activity. So the idle connections may time out easily.
Becaus timeout transport doesn't reuse connections, it prevents using
timeouted connection.

This fixes the problem that etcd fail to get version of peers.
2015-11-03 07:58:03 -08:00
fe165de1d1 Merge pull request #3794 from yichengq/fix-proxy-term
etcdmain: fix parsing discovery error
2015-11-02 17:33:47 -08:00
9757dcd3a2 etcdmain: fix parsing discovery error
The discovery error is wrapped into a struct now, and cannot be compared
to predefined errors. Correct the comparison behavior to fix the
problem.
2015-11-02 17:23:06 -08:00
4fd65ecd4c Merge pull request #3785 from yichengq/fix-block-test
storage: extend wait timeout for execution
2015-11-02 12:53:18 -08:00
a74dd4c47a Merge pull request #3790 from xiang90/etcd-top
add etcdtop
2015-11-02 12:02:05 -08:00
8f9d237d21 Merge pull request #3792 from wojtek-t/update_ugorji
Update dependency on ugorji/go/codec
2015-11-02 07:43:48 -08:00
65ae8784fb client: regenerate code to unmarshal key response
Regenerate code for unmarshaling key response with a new version of
ugorji/go/codec.
2015-11-02 12:06:32 +01:00
02eec7763d Godeps: update ugorji/go/codec dependency
Update ugorji/go/codec dependency to the newer version.
2015-11-02 12:03:49 +01:00
e1b2e7245b tools/etcd-top: add copyright header 2015-11-01 18:19:32 -08:00
c4f8fe96e8 travis: install libpcap 2015-11-01 18:16:20 -08:00
2bfe995fb8 Godeps: add dependency for etcd-top 2015-11-01 18:07:27 -08:00
00557e96af tools: add etcd-top 2015-11-01 18:07:27 -08:00
59b5dabc66 storage: extend wait timeout for execution
Extend timeout to pass always in traivs.
2015-10-30 17:44:31 -07:00
f787e7904b Merge pull request #3762 from jonboulle/auth
etcdserver: restructure auth.Store and auth.User
2015-10-30 16:49:28 -07:00
ee522025b3 etcdserver: restructure auth.Store and auth.User
This attempts to decouple password-related functions, which previously
existed both in the Store and User structs, by splitting them out into a
separate interface, PasswordStore.  This means that they can be more
easily swapped out during testing.

This also changes the relevant tests to use mock password functions
instead of the bcrypt-backed implementations; as a result, the tests are
much faster.

Before:
```
	github.com/coreos/etcd/etcdserver/auth		31.495s
	github.com/coreos/etcd/etcdserver/etcdhttp	91.205s
```

After:
```
	github.com/coreos/etcd/etcdserver/auth		1.207s
	github.com/coreos/etcd/etcdserver/etcdhttp	1.207s
```
2015-10-30 16:33:40 -07:00
2840260b3b Merge pull request #3781 from gyuho/doc_typo_20151029
Documentation: fix typo in proxy.md
2015-10-29 20:34:16 -07:00
294f03f85f Documentation: fix typo in proxy.md
This fixes some typos in proxy.md.

Thanks,
2015-10-29 20:30:50 -07:00
00b4880494 Update ROADMAP.md 2015-10-29 16:05:40 -07:00
42255748cd Update ROADMAP.md 2015-10-29 16:03:14 -07:00
1b3d9130c9 Merge pull request #3759 from yichengq/rafthttp-unreachable
rafthttp: mark unreachable on unexpected response
2015-10-29 15:12:23 -07:00
1944893ef8 Merge pull request #3776 from gyuho/etcdmain_doc
etcdmain: fix package description compatible with godoc.org
2015-10-29 12:36:11 -07:00
821c071f3f etcdmain: fix package description for godoc.org
This fixes package description for etcdmain that wasn't compatible with
godoc.org, by deleting the extra blank lines between comment and package name.
2015-10-29 12:28:52 -07:00
84d7825a77 rafthttp: stop masking errMemberRemoved in pipeline
It makes logic more straightforward and readable. Also, it makes the
handle method consistent with stream and snapshot sender.
2015-10-28 21:40:48 -07:00
908a011604 rafthttp: mark unreachable on unexpected response
In rafthttp, when making request to some endpoint, it may receive
response with unexpected status code and header. This indicates the endpoint
doesn't function correctly. It should mark the endpoint unreachable.
2015-10-28 21:40:11 -07:00
2fe6893d5d Merge pull request #3772 from xiang90/watcher_sep
storage: move watcher interface into watcher.go
2015-10-28 21:14:55 -07:00
f71bcfa8ce storage: move watcher interface into watcher.go 2015-10-28 21:10:58 -07:00
de99c9ed58 Merge pull request #3770 from yichengq/link-etcdctl
docs/libraries-and-tools: update the link of etcdctl
2015-10-28 14:07:33 -07:00
d6b4c7b67c pkg/cors: add authorization into Access-Control-Allow-Headers
This helps browser to send auth-related request to etcd server when
cors flag is set.
2015-10-28 13:58:04 -07:00
4f36897f8c Merge pull request #3767 from kamilhark/master
Added etcdsh command line tool to the list
2015-10-28 13:41:14 -07:00
ebde1d720e docs/libraries-and-tools: update the link of etcdctl
The old repo is deprecated, and we develop etcdctl in etcd repo now.
2015-10-28 13:37:28 -07:00
b5c176360e Merge pull request #3768 from yichengq/fix-publish-test
etcdserver: extend wait timeout in TestPublishRetry
2015-10-28 13:31:51 -07:00
695a5148cf Merge pull request #3769 from msoap/fix-docs
documentation: changed link to style doc
LGTM. Link accurate (old link was redir'd anyway).
Tiny fix, doc-only, so directly merging.
2015-10-28 13:13:37 -07:00
3dad5fffc0 documentation: changed link to style doc
Go-project has been moved from code.google.com to github.com
2015-10-28 21:49:28 +02:00
7d757bbc8a etcdserver: extend wait timeout in TestPublishRetry
It fixes the failure in semaphore CI:
```
--- FAIL: TestPublishRetry (0.00s)
		server_test.go:1108: len(action) = 1, want >= 2
```
2015-10-28 12:07:00 -07:00
ae18e6ea37 docs/libraries-and-tools: Added etcdsh command line tool to the list 2015-10-28 19:38:09 +01:00
099d8674c4 Merge pull request #3746 from yichengq/load-storage
etcdserver: fix recovering snapshot from disk
2015-10-27 14:42:41 -07:00
4b8ee2d66e storage: skip old entry in ConsistentWatchableStore
This avoids to apply the same entry twice when restoring from disk.
2015-10-26 23:26:06 -07:00
263b270708 etcdserver: commit v3 storage before releasing WAL
This ensures that v3 storage could always find the following log entries
when restart.
2015-10-26 21:06:08 -07:00
70f9407d2d Merge pull request #3758 from xiang90/race
*: fix various data races detected by race detector
2015-10-26 20:57:31 -07:00
ab4892ade2 Merge pull request #3749 from gyuho/etcdmain_flags_20151025
etcdmain: make flags and formats idential
2015-10-26 20:54:37 -07:00
a8e6e71bf9 *: fix various data races detected by race detector 2015-10-26 20:49:37 -07:00
306dd7183b Merge pull request #3757 from xiang90/race
rafthttp: fix data races detected by go race detector
2015-10-26 17:10:17 -07:00
336d177c82 rafthttp: fix data races detected by go race detector 2015-10-26 15:29:08 -07:00
4766227b76 Merge pull request #3750 from yichengq/rafthttp-continue
rafthttp: fix wrong return in pipeline.handle
2015-10-26 14:11:42 -07:00
4076dda101 rafthttp: fix wrong return in pipeline.handle
pipeline.handle is a long-living one, and should continue to receive
next message to send out when current message fails to send. So it
should `continue` instead of `return` here.
2015-10-26 14:05:19 -07:00
44bbc87698 Merge pull request #3756 from suryanathan/master
docs/libraries-and-tools: Update libraries-and-tools.md with etcdcpp
2015-10-26 13:52:42 -07:00
e4ada19996 docs/libraries-and-tools: Update libraries-and-tools.md with etcdcpp
Add a c++ language binding for API version 2.2.0
2015-10-26 16:44:17 -04:00
cc378585a9 Merge pull request #3755 from jonboulle/master
travis: only run unit tests
2015-10-26 13:36:18 -07:00
516be7a781 travis: only run unit tests
Travis has chronic problems successfully running the integration suite -
and we've successfully moved to Semaphore for that purpose - but can
still be useful as a fail-fast option for testing unit tests and formatting.
2015-10-26 12:47:15 -07:00
52782cf8ee etcdmain: make flags and formats idential
This makes flagsline and config.go identical in its flag description and some
punctuation conventions.
2015-10-25 06:31:37 -07:00
d44b79c3c9 Merge pull request #3748 from coreos/revert-3737-rafthttp-continue
Revert "rafthttp: fix wrong return in pipeline.handle"
2015-10-24 21:05:52 -07:00
5eda45ece6 Revert "rafthttp: fix wrong return in pipeline.handle" 2015-10-24 20:25:56 -07:00
dbba5bb373 Merge pull request #3737 from yichengq/rafthttp-continue
rafthttp: fix wrong return in pipeline.handle
2015-10-24 19:42:38 -07:00
7e38f05ceb Merge pull request #3742 from yichengq/save-index
etcdserver: save consistent index into v3 storage
2015-10-24 09:48:28 -07:00
15ed6d8268 etcdserver: save consistent index into v3 storage
This helps to recover consistent index when restart in the future.
2015-10-24 09:27:24 -07:00
f648d52afe rafthttp: fix wrong return in pipeline.handle
pipeline.handle is a long-living one, and should continue to receive
next message to send out when current message fails to send. So it
should `continue` instead of `return` here.
2015-10-23 17:00:03 -07:00
41cb39b68a storage: Get -> ConsistentIndex in ConsistentIndexGetter
To make the method name more specific in the context.
2015-10-23 16:40:55 -07:00
4f47b08cf6 Merge pull request #3744 from yichengq/fix-sem
raft: extend wait timeout in TestMultiNodeAdvance
2015-10-23 13:20:52 -07:00
bf3057e5bd raft: extend wait timeout in TestMultiNodeAdvance
This fixes the failure met in semaphore CI:

```
--- FAIL: TestMultiNodeAdvance-2 (0.01s)
		multinode_test.go:458: expect Ready after Advance, but there is
		no Ready available
```
2015-10-23 12:08:24 -07:00
01559fafeb Merge pull request #3741 from yichengq/receive-restore
etcdserver: restore KV snapshot when receiving snapshot
2015-10-23 09:24:17 -07:00
cacc0d6432 etcdserver: restore KV snapshot when receiving snapshot
When a slow follower receives the snapshot sent from the leader, it
should rename the snapshot file to the default KV file path, and
restore KV snapshot.

Have tested it manually and it works pretty well.
2015-10-23 08:43:26 -07:00
d33c26c20a Merge pull request #3730 from yichengq/storage-consistent
storage: add consistentWatchableStore
2015-10-23 08:15:04 -07:00
4fb4bc3ca8 storage: add consistentWatchableStore
consistentWatchableStore maintains an index that is always consistent
with the latest txn. The index could be used to indicate the progress
of the store so far when recovery.
2015-10-22 22:54:51 -07:00
ae62a77de6 Merge pull request #3729 from xiang90/mem_bench
doc: add benchmark doc for new storage pkg
2015-10-22 10:54:43 -07:00
e3cedeeb12 doc: add benchmark doc for new storage pkg 2015-10-22 13:53:03 -04:00
2feccd3fa4 Merge pull request #3733 from yichengq/fix-wait-timeout
pkg/transport: extend wait timeout for write
2015-10-22 13:07:14 -04:00
d3ebecdddd pkg/transport: extend wait timeout for write
This helps the test to pass safely in semaphore CI.

Based on my manual testing, it may take at most 500ms to return
error in semaphore CI, so I set 1s as a safe value.
2015-10-21 18:27:21 -07:00
8b08fff1e9 Merge pull request #3731 from yichengq/storage-kv
storage: fix WatchableKV interface and refine comment
2015-10-21 17:27:24 -07:00
01b163e77d Merge pull request #3588 from gyuho/storage/watchable_store.go-use-map-for-unsynced
storage/watchable_store.go: use map for unsynced
2015-10-21 16:50:15 -07:00
44cecb8624 Merge pull request #3732 from yichengq/config-header
docs/configuration: fix heading hierarchy
2015-10-21 15:50:38 -07:00
f73d0ed1d9 storage: use map for watchable store unsynced
This is for `TODO: use map to reduce cancel cost`.
I switched slice to map, and benchmark results show
that map implementation performs better, as follows:

```
[1]:
benchmark                                   old ns/op     new ns/op     delta
BenchmarkWatchableStoreUnsyncedCancel       215212        1307          -99.39%
BenchmarkWatchableStoreUnsyncedCancel-2     120453        710           -99.41%
BenchmarkWatchableStoreUnsyncedCancel-4     120765        748           -99.38%
BenchmarkWatchableStoreUnsyncedCancel-8     121391        719           -99.41%

benchmark                                   old allocs     new allocs     delta
BenchmarkWatchableStoreUnsyncedCancel       0              0              +0.00%
BenchmarkWatchableStoreUnsyncedCancel-2     0              0              +0.00%
BenchmarkWatchableStoreUnsyncedCancel-4     0              0              +0.00%
BenchmarkWatchableStoreUnsyncedCancel-8     0              0              +0.00%

benchmark                                   old bytes     new bytes     delta
BenchmarkWatchableStoreUnsyncedCancel       200           1             -99.50%
BenchmarkWatchableStoreUnsyncedCancel-2     138           0             -100.00%
BenchmarkWatchableStoreUnsyncedCancel-4     138           0             -100.00%
BenchmarkWatchableStoreUnsyncedCancel-8     139           0             -100.00%

[2]:
benchmark                                   old ns/op     new ns/op     delta
BenchmarkWatchableStoreUnsyncedCancel       212550        1117          -99.47%
BenchmarkWatchableStoreUnsyncedCancel-2     120927        691           -99.43%
BenchmarkWatchableStoreUnsyncedCancel-4     120752        699           -99.42%
BenchmarkWatchableStoreUnsyncedCancel-8     121012        688           -99.43%

benchmark                                   old allocs     new allocs     delta
BenchmarkWatchableStoreUnsyncedCancel       0              0              +0.00%
BenchmarkWatchableStoreUnsyncedCancel-2     0              0              +0.00%
BenchmarkWatchableStoreUnsyncedCancel-4     0              0              +0.00%
BenchmarkWatchableStoreUnsyncedCancel-8     0              0              +0.00%

benchmark                                   old bytes     new bytes     delta
BenchmarkWatchableStoreUnsyncedCancel       197           1             -99.49%
BenchmarkWatchableStoreUnsyncedCancel-2     138           0             -100.00%
BenchmarkWatchableStoreUnsyncedCancel-4     138           0             -100.00%
BenchmarkWatchableStoreUnsyncedCancel-8     139           0             -100.00%

[3]:
benchmark                                   old ns/op     new ns/op     delta
BenchmarkWatchableStoreUnsyncedCancel       214268        1183          -99.45%
BenchmarkWatchableStoreUnsyncedCancel-2     120763        759           -99.37%
BenchmarkWatchableStoreUnsyncedCancel-4     120321        708           -99.41%
BenchmarkWatchableStoreUnsyncedCancel-8     121628        680           -99.44%

benchmark                                   old allocs     new allocs     delta
BenchmarkWatchableStoreUnsyncedCancel       0              0              +0.00%
BenchmarkWatchableStoreUnsyncedCancel-2     0              0              +0.00%
BenchmarkWatchableStoreUnsyncedCancel-4     0              0              +0.00%
BenchmarkWatchableStoreUnsyncedCancel-8     0              0              +0.00%

benchmark                                   old bytes     new bytes     delta
BenchmarkWatchableStoreUnsyncedCancel       200           1             -99.50%
BenchmarkWatchableStoreUnsyncedCancel-2     139           0             -100.00%
BenchmarkWatchableStoreUnsyncedCancel-4     138           0             -100.00%
BenchmarkWatchableStoreUnsyncedCancel-8     139           0             -100.00%

[4]:
benchmark                                   old ns/op     new ns/op     delta
BenchmarkWatchableStoreUnsyncedCancel       208332        1089          -99.48%
BenchmarkWatchableStoreUnsyncedCancel-2     121011        691           -99.43%
BenchmarkWatchableStoreUnsyncedCancel-4     120678        681           -99.44%
BenchmarkWatchableStoreUnsyncedCancel-8     121303        721           -99.41%

benchmark                                   old allocs     new allocs     delta
BenchmarkWatchableStoreUnsyncedCancel       0              0              +0.00%
BenchmarkWatchableStoreUnsyncedCancel-2     0              0              +0.00%
BenchmarkWatchableStoreUnsyncedCancel-4     0              0              +0.00%
BenchmarkWatchableStoreUnsyncedCancel-8     0              0              +0.00%

benchmark                                   old bytes     new bytes     delta
BenchmarkWatchableStoreUnsyncedCancel       194           1             -99.48%
BenchmarkWatchableStoreUnsyncedCancel-2     139           0             -100.00%
BenchmarkWatchableStoreUnsyncedCancel-4     139           0             -100.00%
BenchmarkWatchableStoreUnsyncedCancel-8     139           0             -100.00%

[5]:
benchmark                                   old ns/op     new ns/op     delta
BenchmarkWatchableStoreUnsyncedCancel       211900        1097          -99.48%
BenchmarkWatchableStoreUnsyncedCancel-2     121795        753           -99.38%
BenchmarkWatchableStoreUnsyncedCancel-4     123182        700           -99.43%
BenchmarkWatchableStoreUnsyncedCancel-8     122820        688           -99.44%

benchmark                                   old allocs     new allocs     delta
BenchmarkWatchableStoreUnsyncedCancel       0              0              +0.00%
BenchmarkWatchableStoreUnsyncedCancel-2     0              0              +0.00%
BenchmarkWatchableStoreUnsyncedCancel-4     0              0              +0.00%
BenchmarkWatchableStoreUnsyncedCancel-8     0              0              +0.00%

benchmark                                   old bytes     new bytes     delta
BenchmarkWatchableStoreUnsyncedCancel       198           1             -99.49%
BenchmarkWatchableStoreUnsyncedCancel-2     140           0             -100.00%
BenchmarkWatchableStoreUnsyncedCancel-4     141           0             -100.00%
BenchmarkWatchableStoreUnsyncedCancel-8     141           0             -100.00%
```
2015-10-21 15:30:15 -07:00
f95d18f766 docs/configuration: fix heading hierarchy
to make it consistent with other sections in the doc.
2015-10-21 15:23:48 -07:00
027c073d55 storage: refine Watch comment in WatchableKV
Explain explicitly how these arguments are used.
2015-10-21 14:47:52 -07:00
56b7584418 Merge pull request #3725 from joshix/hdinghier-mulligan
Documentation: Fix heading hierarchy.
2015-10-21 13:52:57 -07:00
2673e657e6 storage: fix WatchableKV interface
We delete endRev from the watch functionality, so the interface needs
to be fixed.
2015-10-21 11:50:25 -07:00
35eb26ef5d Merge pull request #3726 from yichengq/watch-store
storage: add store field in watchableStore
2015-10-21 11:07:45 -07:00
0f7374ce89 storage: KV field -> store field in watchableStore
We need to access the underlying store to use its RangeEvents function.
It is not good to use unnecessary type conversion.

The underlying store is also needed for further store upon
watchableStore.
2015-10-20 19:23:20 -07:00
8d3ed0176c Merge pull request #3727 from yichengq/govet
raft: fix malformed example name
2015-10-20 16:51:47 -07:00
01806c3e80 raft: fix malformed example name
It is reported by latest govet:
```
gopath/src/github.com/coreos/etcd/raft/example_test.go:26: Example_Node
has malformed example suffix: Node
```
2015-10-20 16:40:01 -07:00
98bdeab53b Documentation: Fix heading hierarchy.
Correct the hierarchy of Markdown symbols in document headings.
2015-10-20 15:26:49 -07:00
704bff0c77 Merge pull request #3724 from coreos/philips-patch-1
README: fix language for release binaries
2015-10-20 14:22:02 -07:00
5b5b0ef060 README: attempt to make it even clearer 2015-10-20 14:17:33 -07:00
b38e21a9e9 README: fix language for release binaries
Address confusion on where to find stuff and make it easier to find with keywords for various operating systems.
2015-10-20 14:07:40 -07:00
9635d8d94c Merge pull request #3720 from yichengq/clean-streamAppV1
rafthttp: deprecate streamTypeMsgApp and remove msgApp stream sent restriction due to streamTypeMsgApp
2015-10-20 10:37:51 -07:00
de669be6d6 Merge pull request #3683 from yichengq/raft-block
etcdserver: fix raft state machine may block
2015-10-20 09:44:34 -07:00
ab5df57ecf etcdserver: fix raft state machine may block
When snapshot store requests raft snapshot from etcdserver apply loop,
it may block on the channel for some time, or wait some time for KV to
snapshot. This is unexpected because raft state machine should be unblocked.

Even worse, this block may lead to deadlock:
1. raft state machine waits on getting snapshot from raft memory storage
2. raft memory storage waits snapshot store to get snapshot
3. snapshot store requests raft snapshot from apply loop
4. apply loop is applying entries, and waits raftNode loop to finish
messages sending
5. raftNode loop waits peer loop in Transport to send out messages
6. peer loop in Transport waits for raft state machine to process message

Fix it by changing the logic of getSnap to be asynchronously creation.
2015-10-20 09:19:34 -07:00
b61eaf3335 rafthttp: msgApp{Reader/Writer} -> msgAppV2{Reader/Writer}
To make what it serves more clear.
2015-10-20 08:28:06 -07:00
5060b2f322 rafthttp: send all MsgApp on stream msgAppV2
For stream msgAppV2, as long as the message is MsgApp type, it should be sent
through stream msgAppV2.
2015-10-20 08:23:36 -07:00
33231fccdd rafthttp: fix wrong stream name returned by pick
msgAppWriter uses streamAppV2 type, and it should return the correct name.
2015-10-20 08:17:06 -07:00
f725f6a552 rafthttp: deprecate streamTypeMsgApp
streamTypeMsgApp is only used in etcd 2.0. etcd 2.3 should not talk to
etcd 2.0, either send or receive requests. So I deprecate streamTypeMsgApp
and its related stuffs from rafthttp package.

updating term is only used from streamTypeMsgApp, so it is removed too.
2015-10-20 08:15:54 -07:00
eb7bce893e Merge pull request #3721 from mitake/servevars
etcdserver: don't allow methods other than GET in /debug/vars
2015-10-20 08:01:38 -07:00
1b0c65c299 etcdserver: don't allow methods other than GET in /debug/vars
Currently, /debug/vars seems to allow all types of methods e.g. PUT,
POST, etc. However, this path is a readonly stuff so it should allow
GET only.
2015-10-20 17:19:42 +09:00
7dcb99b60e Merge pull request #3656 from endocode/kayrus/client_doc
Added example on how to get node's value
2015-10-19 20:41:12 -07:00
e5082fce54 Merge pull request #3718 from gyuho/gyuho_README
README: fix typo
2015-10-19 17:14:46 -07:00
7d326087f9 README: fix typo
It looks like a typo. Or is it elided sentence?
Thanks,
2015-10-19 17:03:40 -07:00
4e9f137d1b Merge pull request #3716 from yichengq/add-sem-badge
README: add semaphore CI status badge into README
2015-10-19 16:25:21 -07:00
6ebd62a869 README: add semaphore CI status badge into README 2015-10-19 16:08:40 -07:00
32dd4d5de3 Merge pull request #3657 from xiang90/fix_remove
etcdserver: skip updating attr if the member does not exist
2015-10-19 13:35:57 -07:00
776e9fb7be Merge pull request #3703 from xiang90/bolt
storage/backend: avoid creating new bolt.tx during a batchTx
2015-10-19 10:30:57 -07:00
afb35e366d client: added example on how to get node's value 2015-10-19 10:31:05 +02:00
79c263b2ec Merge pull request #3707 from xiang90/CI
pkg/transport: longer timeout for slow CI
2015-10-18 17:22:15 -07:00
7c6e2deb66 Merge pull request #3708 from xiang90/travis
travis: drop go-nyet
2015-10-18 16:43:53 -07:00
559b76f401 travis: drop go-nyet 2015-10-18 16:41:57 -07:00
3c1ecf70cf pkg/transport: longer timeout for slow CI 2015-10-18 16:32:18 -07:00
5372e11727 Merge pull request #3704 from xiang90/rafthttp
clean up rafthttp pkg: round1
2015-10-18 10:00:26 -07:00
427a154aae rafthttp: various clean up 2015-10-18 09:49:18 -07:00
7d3af5e15f rafthttp: rename message.go -> message_codec.go 2015-10-18 09:49:11 -07:00
e87cd0c17b rafthttp: move new funcs to right place 2015-10-18 09:48:59 -07:00
f08d750b0b Merge pull request #3697 from mqliang/cluster-health
etcdctl: fix health check condition
2015-10-18 09:26:05 -07:00
478fab6aca rafthttp: rename NewHandler to newPipelineHandler 2015-10-17 22:33:28 -07:00
080c11d14e rafthttp: make ConnReadLimitByte private and add comment 2015-10-17 22:20:36 -07:00
5efdd7bc6d storage/backend: avoid creating new bolt.tx during a batchTx 2015-10-17 21:31:34 -07:00
b2d92dedae etcdctl:fix health check condition 2015-10-18 08:22:13 +08:00
d07c9b00e5 Merge pull request #3701 from xiang90/rm_end_watcher
storage: remove the endRev of watcher
2015-10-17 16:04:27 -07:00
6556bf1643 storage: remove the endRev of watcher 2015-10-17 15:59:49 -07:00
c1e4e647eb snap: use Histogram for snap metrics 2015-10-17 12:57:18 -07:00
d90a47656e etcdserver: use Histogram for proposal_durations 2015-10-17 12:48:25 -07:00
1c7f52d931 wal: use Histogram for syncDuration 2015-10-17 12:45:43 -07:00
f78a11d468 Merge pull request #3694 from philips/fix-configuration-headers
Documentation: configuration docs headers
2015-10-16 12:39:10 -07:00
22d8ca4c9a Documentation: configuration docs headers
The configuration docs are indented weird compared to our standards.
This makes the sidebar here not work right:
https://coreos.com/etcd/docs/latest/configuration.html
2015-10-16 12:32:57 -07:00
fd07e02604 Merge pull request #3691 from gyuho/documentation_20151015
raft/documentation: clarify progress's subjects.
2015-10-15 20:02:00 -07:00
1716d5858f raft/documentation: clarify progress's subjects.
If I understand correctly, `progress` represents the states of follower. For
me, some comments weren't clear because it was missing the subjects of
`progress`. This adds more clarification on who is doing what. Please let me
know if I misunderstood anything. Thanks,
2015-10-15 19:15:08 -07:00
9ce79dbbc3 Merge pull request #3685 from gyuho/etcdctl_mk_command_2
etcdctl: fix mk command with PrevNoExist
2015-10-15 12:13:54 -07:00
df34d67e98 Merge pull request #3689 from ccding/patch-1
raft/doc: fix misuse of `for' loop in docs
2015-10-15 09:20:05 -07:00
362df8e470 raft/doc: fix misuse of `for' loop in docs 2015-10-15 11:13:30 -05:00
1dab7e8084 etcdctl/command: mk command with PrevNoExist
This attempts to fix #3676. `PrevNoExist` checks if the key previously exists
and if so, it returns an error, which is how `mk` command is supposed to work.
The previous code ignores the previous key and overwrites with the later value.

/cc @yichengq
2015-10-15 09:05:17 -07:00
c2e49b5622 Merge pull request #3687 from ccding/patch-1
raft/doc: fix typos
2015-10-15 07:52:43 -07:00
f1f92f0fa3 raft/doc: fix typos 2015-10-15 02:17:34 -05:00
afd74dfeb7 Merge pull request #3611 from mitake/etcdctl-timeout
etcdctl: use a context with -total-timeout in simple commands
2015-10-14 16:13:34 -07:00
f78ccbbd3f Merge pull request #3681 from yichengq/godep-update
Godeps: update prometheus dependency
2015-10-14 11:44:49 -07:00
d807c895b8 Godeps: update prometheus dependency
prometheus updates its directory layout
(https://github.com/prometheus/client_golang#where-is-model-extraction-and-text)
and makes Godeps restore/save unable to work.

Remove all prometheus dependency manually and godep save again to fix
this problem.
2015-10-14 09:58:36 -07:00
aeb6ef5d8a Merge pull request #3680 from gyuho/Documentation_20151014
Documentation: typos in discovery, faq, security
2015-10-14 08:50:02 -07:00
0598de2e25 Documentation: typos in discovery, faq, security
This just fixes some typos from my reading Documentation.

Thanks,

Documentation: security
2015-10-14 08:48:01 -07:00
d2b40c1c98 Merge pull request #3666 from yichengq/transport-snap
rafthttp: support sending v3 snapshot message
2015-10-13 23:58:02 -07:00
1f21ccf166 rafthttp: support sending v3 snapshot message
Use snapshotSender to send v3 snapshot message. It puts raft snapshot
message and v3 snapshot into request body, then sends it to the target peer.
When it receives http.StatusNoContent, it knows the message has been
received and processed successfully.

As receiver, snapHandler saves v3 snapshot and then processes the raft snapshot
message, then respond with http.StatusNoContent.
2015-10-13 23:11:28 -07:00
6afd8e4fd9 ROADMAP: fix v3 API issues link 2015-10-12 20:56:20 -07:00
70dc205500 Merge pull request #3665 from raoofm/patch-2
hack: bench to use Content Type as value written was always empty
2015-10-12 11:35:34 -07:00
90e8dcd9bf hack: benchmarking to use Content Type
hack: benchmarking to use Content Type: application/x-www-form-urlencoded

boom sends PUT request with Content-Type: text/html, and this cannot be parsed by etcd. It should use Content-Type: application/x-www-form-urlencoded. Post this fix the official etcd benchmarks should be updated as the keysize was not a factor as the value being set was always empty.
2015-10-12 10:03:09 -04:00
df7074911e Merge pull request #3664 from yichengq/transport-more
rafthttp: build transport inside pkg instead of passed-in
2015-10-11 22:00:05 -07:00
207c92b627 rafthttp: build transport inside pkg instead of passed-in
rafthttp has different requirements for connections created by the
transport for different usage, and this is hard to achieve when giving
one http.RoundTripper. Pass into pkg the data needed to build transport
now, and let rafthttp build its own transports.
2015-10-11 21:42:37 -07:00
988a09eb20 Merge pull request #3663 from yichengq/transport-rt
pkg/transport: pass dial timeout to NewTransport
2015-10-11 10:12:09 -07:00
9673eb625a pkg/transport: pass dial timeout to NewTransport
So we could set dial timeout for new transport, which makes it
customizable according to max RTT.
2015-10-11 10:09:25 -07:00
017f5f4670 Merge pull request #3662 from yichengq/transport
rafthttp: expose struct to set configuration
2015-10-11 09:35:15 -07:00
233e717e2f rafthttp: expose struct to set configuration
transport takes too many arguments and the new function is unable to
read. Change the way to set fields in transport struct directly.
2015-10-11 09:02:16 -07:00
9534b11ad2 Merge pull request #3660 from gyuho/Documentation_typos_20151009
Documentation: fix typos
2015-10-09 16:10:49 -07:00
d05d6f8bbb Documentation: fix typos
I found some typos. Please let me know if you have any feedback.

Thanks,

Documentation: fix metrics.md typo

Documentation: trim blank lines in metrics.md
2015-10-09 15:56:58 -07:00
ce45159767 Merge pull request #3655 from wojtek-t/update_dependency
Update dependency on ugorji/go/codec
2015-10-09 10:23:47 -07:00
4eb598be06 client: regenerate code to unmarshal key response
Regenerate code for unmarshaling key response with a new version of
ugorji/go/codec
2015-10-09 10:59:42 +02:00
8ebbec8e05 Godeps: update ugorji/go/codec dependency
Update ugorji/go/codec dependency to the newer version (a bunch of fixed were made).
2015-10-09 10:59:42 +02:00
eadfd138a4 Merge pull request #3658 from mqliang/patch-2
docs/api.md: fix documentation
2015-10-08 19:51:54 -07:00
8c580ffe2d docs/api.md: fix documentation
Fix documentation
2015-10-09 10:41:12 +08:00
98e30ca7c2 etcdserver: skip updating attr if the member does not exist 2015-10-08 14:07:16 -07:00
f74ff9b867 Merge pull request #3644 from mitake/test-race
etcdserver, test: don't access testing.T in time.AfterFunc()'s own go…
2015-10-07 08:34:58 -07:00
dc394a0a99 Merge pull request #3649 from kkaneda/kkaneda/comment_fix
raft: fix a description of MemoryStorage.Compact
2015-10-06 21:51:20 -07:00
ebd8cb04c1 raft: fix a description of MemoryStorage.Compact
The parameter name is compactIndex, not i.
2015-10-06 21:49:33 -07:00
68dd3ee621 etcdserver, test: don't access testing.T in time.AfterFunc()'s own goroutine
time.AfterFunc() creates its own goroutine and calls the callback
function in the goroutine. It can cause datarace like the problem
fixed in the commit de1a16e0f1 . This
commit also fixes the potential dataraces of tests in
etcdserver/server_test.go .
2015-10-06 11:37:08 +09:00
21179d929f Merge pull request #3616 from yichengq/storage-txn
storage: hold batchTx lock during KV txn
2015-10-05 17:12:52 -07:00
4f2ada3f1e Merge pull request #3643 from xiang90/metrics_storage
storage: add metrics for db total size
2015-10-05 17:11:00 -07:00
0aa2f1192a storage: add metrics for db total size 2015-10-05 16:56:30 -07:00
522ee6ab3a Merge pull request #3635 from yichengq/parse-ipv6
pkg/types: fix unwanted unescape in NewURLsMap
2015-10-05 15:30:51 -07:00
699e37562e Merge pull request #3637 from yichengq/run-snapshot
etcdserver: get existing snapshot instead of requesting one
2015-10-05 14:57:03 -07:00
e117f36e48 pkg/types: fix unwanted unescape in NewURLsMap
We use url.ParseQuery to parse names-to-urls string, but it has side
effect that unescape the string. If the initial-cluster string has ipv6
which contains `%25`, it will unescape it to `%` and make further url
parse failed.

Fix it by modifiying the parse process.

Go1.4 doesn't support literal IPv6 address w/ zone in
URI(https://github.com/golang/go/issues/6530), so we only enable tests
in Go1.5+.
2015-10-05 14:54:17 -07:00
8c94ae0ee3 etcdserver: get existing snapshot instead of requesting one
This fixes the problem that proposal cannot be applied.

When start the etcdserver.run loop, it expects to get the latest
existing snapshot. It should not attempt to request one because the loop
is the entity to create the snapshot.
2015-10-05 14:32:16 -07:00
ba949ae2be Merge pull request #3640 from xiang90/watch_metrics
storage: add metrics for watchers
2015-10-05 11:46:50 -07:00
09157d4f1a storage: add metrics for watchers 2015-10-05 11:32:56 -07:00
432b1bc230 Merge pull request #3638 from gyuho/documentation_proxy
Documentation: proxy.md typo, line-breaks
2015-10-05 07:57:22 -07:00
f2dae5a0d2 Documentation: proxy.md typo, line-breaks
1. I found a little typo (easily -> easy)
2. If you go to https://coreos.com/etcd/docs/2.0.9/proxy.html,
   the proxy flag command is out of width of the web-page. Can
   we have line-breaks between flags to make the command easier
   to read?

Thanks for the great documentation!
2015-10-04 12:25:25 -07:00
c97dda766e storage: hold batchTx lock during KV txn
One txn is treated as atomic, and might contain multiple Put/Delete/Range
operations. For now, between these operations, we might call forecCommit
to sync the change to disk, or backend may commit it in background.
Thus the snapshot state might contains an unfinished multiple objects
transaction, which is dangerous if database is restored from the snapshot.

This PR makes KV txn hold batchTx lock during the process and avoids
commit to happen.
2015-10-03 16:01:05 -07:00
581cc5cff4 Merge pull request #3608 from yichengq/storage-snapshot
storage: update KV.Snapshot function
2015-10-03 15:32:14 -07:00
36f4303fc3 storage/etcdserver: update KV.Snapshot function
When using Snapshot function, it is expected:
1. know the size of snapshot before writing data
2. split snapshot-ready phase and write-data phase. so we could cut
snapshot first and write data later.

Update its interface to fit the requirement of etcdserver.
2015-10-03 10:15:23 -07:00
8c0db94fef Merge pull request #3631 from yichengq/create-snapshot
etcdserver: support to create raft snapshot at apply loop
2015-10-03 10:03:27 -07:00
675d4306b0 Merge pull request #3634 from yichengq/fix-cluster-output
etcdserver: print out correct restored cluster info
2015-10-02 16:23:33 -07:00
18c568bc82 etcdserver: print out correct restored cluster info
Before this PR, it always prints nil because cluster info has not been
covered when print:

```
2015-10-02 14:00:24.353631 I | etcdserver: loaded cluster information
from store: <nil>
```
2015-10-02 16:11:32 -07:00
69ca0b8475 Merge pull request #3633 from xiang90/systemd_readiness
etcdmain: print out error and suggestion for fixing notify issue
2015-10-02 13:56:16 -07:00
51043830d4 etcdmain: print out error and suggestion for fixing notify issue 2015-10-02 13:39:41 -07:00
bfe9502f4f etcdserver: support to create raft snapshot at apply loop
and snapStore could trigger it to create the latest raft snapshot.
2015-10-02 13:17:56 -07:00
f8a4d1f01b Merge pull request #3607 from xiang90/doc_name
doc: emphasize name should be unique
2015-10-02 12:23:06 -07:00
2733e3f543 doc: emphasize name should be unique 2015-10-02 09:58:20 -07:00
f093559b1d Merge pull request #3632 from mickep76/master
docs/libraries-and-tools: add etcd-rest rest api daemon
2015-10-02 09:49:59 -07:00
4835a411c7 docs/libraries-and-tools: add etcd-rest rest api daemon 2015-10-02 14:55:57 +02:00
ccce61bda9 Merge pull request #3614 from yichengq/snapshot-store
etcdserver: add snapshotStore and raftStorage
2015-10-01 19:35:34 -07:00
f47cbf3073 Merge pull request #3627 from jelmer/typofix
Fix typo: boostrapping -> bootstrapping.
2015-10-01 19:02:47 -07:00
2276328720 etcdserver: add snapshotStore and raftStorage
snapshotStore is the store of snapshot, and it supports to get latest snapshot
and save incoming snapshot.

raftStorage supports to get latest snapshot when v3demo is open.
2015-10-01 19:00:59 -07:00
d70975e54c Documentation/configuration.md: Fix typo.
boostrapping -> bootstrapping
2015-10-01 20:03:16 +00:00
715fdfb669 Merge pull request #3093 from mwitkow-io/feature/httpd_metrics
add `events` metrics in etcdhttp.
2015-10-01 12:10:58 -07:00
6e9943a037 Merge pull request #3629 from ccding/master
raft: fix typo in doc
2015-10-01 10:11:12 -07:00
b2edf1d24a raft: fix typo in doc 2015-10-01 11:21:23 -05:00
1b2dc1c796 metrics: add events metrics in etcdhttp. 2015-10-01 08:11:42 +01:00
46e5444d93 Merge pull request #3625 from yichengq/fix-race
pkg/transport: fix a data race in TestReadWriteTimeoutDialer
2015-09-30 17:50:48 -07:00
de1a16e0f1 pkg/transport: fix a data race in TestReadWriteTimeoutDialer
Accessing test.T async will cause data race.

Change to use select to coordinate the access of test.T.
2015-09-30 17:29:24 -07:00
036ea58a77 Documentation: 04 snapshot: add example with fleet 2015-09-30 16:35:28 -07:00
b043635868 Documentation: fix-up the kubernetes github URL 2015-09-30 10:58:33 -07:00
533e728b64 Merge pull request #3609 from yichengq/raft-snapshot
raft: kill TODO about behavior when snapshot fails
2015-09-29 19:32:31 -07:00
4c82b481a5 raft: improve behavior when snapshot fails
etcd is going to support incremental snapshot, and we design to let it
send at most one snapshot out at first stage. So when one snapshot is in
flight, snapshot request will return error.

When failing to get snapshot when sending MsgSnap, raft prints out
related log and abort sending this message.
2015-09-29 19:15:15 -07:00
a535cf2cad Merge pull request #3610 from yichengq/load-storage
etcdserver: restore v3 storage when restart
2015-09-29 11:58:38 -07:00
49d262185d Merge pull request #3590 from yichengq/discovery-log
etcdmain: improve log when join discovery fails
2015-09-29 08:02:18 -07:00
33a0df3e33 etcdctl: use a context with -total-timeout in simple commands
Like the commit 8ebc933111, this commit lets simple etcdctl commands
use a context with timeout value passed via -total-timeout.

This commit doesn't change complex commands like watch,
cluster-health, and import because it is not obvious that using the
context in the commands is good or not.
2015-09-29 17:23:01 +09:00
5d906a0acc etcdserver: restore v3 storage when restart
To load the previous data.
2015-09-29 00:14:27 -07:00
939aa96a34 etcdmain: improve log when join discovery fails
Before this PR, the log is

```
2015/09/1 13:18:31 etcdmain: client: etcd cluster is unavailable or
misconfigured
```

It is quite hard for people to understand what happens.

Now we print out the exact reason for the failure, and explains the way
to handle it.
2015-09-28 23:23:50 -07:00
783884a04e Merge pull request #3606 from kkaneda/kkaneda/tiny_fix
raft: remove an obsolete TODO comment on 4MB maxMsgSize hard coding
2015-09-28 21:44:45 -07:00
f602767e50 raft: remove an obsolete TODO comment on 4MB maxMsgSize hard coding
The TODO comment was added by 7571b2cd, and it was addressed by d9b5b56c.
2015-09-28 21:31:12 -07:00
6c05a01ec6 Merge pull request #3604 from gyuho/replace_netutil_BasicAuth
etcdhttp/auth: BasicAuth method in standard pkg
2015-09-28 15:55:46 -07:00
6264a41e22 ectd/Getting-etcd: update README to require Go1.4+
Notice `For those wanting to try the very latest version,`
2015-09-28 15:35:09 -07:00
e16f81838b etcdhttp/auth: BasicAuth method in standard pkg
I created a new PR from https://github.com/coreos/etcd/pull/3598.
This is for `TODO: use the standard lib BasicAuth method when we move to
Go 1.4.` [1]. `BasicAuth` method got into Go standard package a year ago. [2]

---
1. https://github.com/coreos/etcd/blob/master/pkg/netutil/netutil.go#L126-L138
2. https://codereview.appspot.com/76540043/
2015-09-28 14:02:55 -07:00
7410698761 Merge pull request #3530 from mitake/etcdctl-timeout-v2
etcdctl: use user specified timeout value for entire command execution
2015-09-28 09:45:02 -07:00
8ebc933111 etcdctl: use user specified timeout value for entire command execution
etcdctl should be capable to use a user specified timeout value for
total command execution, not only per request timeout. This commit
adds a new option --total-timeout to the command. The value passed via
this option is used as a timeout value of entire command execution.

Fixes coreos#3517
2015-09-28 10:31:46 +09:00
c645ac23c0 docs: fix link 2015-09-26 17:43:33 -07:00
49d52eaf1e Merge pull request #3596 from xiang90/json_header
etcdhttp: add Content-Type: application/json header to version handler
2015-09-25 15:27:29 -07:00
1226838381 etcdhttp: add Content-Type: application/json header to version handler 2015-09-25 15:14:13 -07:00
c9be719d92 Merge pull request #3579 from gyuho/etcdserver/etcdhttp/httptypes/errors.go-WriteTo-returns-error
httptypes: WriteTo to return error
2015-09-25 14:31:48 -07:00
93edabf85f Merge pull request #3594 from yichengq/exit
etcdmain: exit after print out ErrDuplicateID
2015-09-25 14:28:45 -07:00
dc9a75df1c etcdmain: exit after print out ErrDuplicateID
etcd should exit after printing log for unhandlable error.
2015-09-25 14:10:50 -07:00
60a641762b Merge pull request #3593 from xiang90/fix_race
pkg/transport: fix a data race in TestWriteReadTimeoutListener
2015-09-25 10:16:17 -07:00
5d033c22af pkg/transport: fix a data race in TestWriteReadTimeoutListener 2015-09-25 10:02:37 -07:00
dff702b2b8 Merge pull request #3564 from gouyang/master
Improve proxy log for retrying an unavailable endpoint
2015-09-25 10:02:15 -07:00
85f4475f62 httptypes/errors: HTTPError.WriteTo returns error
Squashing all commits into this one
(from https://github.com/coreos/etcd/pull/357).

Thanks,
2015-09-25 08:06:26 -07:00
e35eeeae42 proxy: improve log for retrying an unavailable endpoint
Fixes #3541

Signed-off-by: Guohua ouyang <guohuaouyang@gmail.com>
2015-09-25 07:36:49 +08:00
9de7f24301 Merge pull request #3554 from mitake/reconfig-doc
doc: add a description of -strict-reconfig-check
2015-09-24 08:07:32 -07:00
78791f81a6 doc: add a description of -strict-reconfig-check 2015-09-24 11:44:55 +09:00
0813a0f2d1 Merge pull request #3585 from xiang90/fix_hash
storage: fix hash by iterating kv
2015-09-23 11:39:21 -07:00
385e17583f storage: fix hash by iterating kv 2015-09-23 11:28:33 -07:00
370ce37d32 Merge pull request #3584 from mickep76/master
docs/libraries-and-tools: add etcd-export tool
2015-09-23 09:32:35 -07:00
c1db1338c9 docs/libraries-and-tools: add etcd-export tool 2015-09-23 18:29:11 +02:00
d6db4e6d6b Merge pull request #3577 from gyuho/storage/watchable_store.go-defer-fix
storage/watchable_store: defer to Unlock s.mu
2015-09-23 07:37:29 -07:00
4113509828 storage/watchable_store: defer to Unlock s.mu
New PR from https://github.com/coreos/etcd/pull/3575.
This add `defer` to `s.mu`. Current code does not `Unlock`
in the correct scope, I think.

(Sorry, I accidentally deleted my fork so the changes
might not sound continuous from my previous pull requests.)
2015-09-22 23:25:07 -07:00
89acdd6245 Merge pull request #3555 from xiang90/proxy_doc
doc: add proxy promotion doc
2015-09-22 12:59:40 -07:00
932bb76cbb Merge pull request #3570 from yichengq/extend-timeout
integration: extend request timeout
2015-09-22 10:17:13 -07:00
eba8a2ed90 Merge pull request #3566 from xiang90/error_msg
etcdsever: mismatch error uses the same format as the corresponding flag
2015-09-22 07:41:46 -07:00
13cfb4284f Merge pull request #3573 from TheHippo/patch-1
docs/security: fixed command typo
2015-09-22 07:41:30 -07:00
2540a3fb7e etcdsever: mismatch error uses the same format as the corresponding flags 2015-09-21 19:32:10 -07:00
94f3297299 docs/security: fixed command typo
`-peer-client-cert-atuh` should be `-peer-client-cert-auth`
2015-09-22 03:39:29 +02:00
305a0d7ab9 integration: extend request timeout
Extend request timeout to give etcd cluster enough time to return
response.
2015-09-21 16:50:22 -07:00
ea3dbfed60 Merge pull request #3408 from MSamman/extend-auth-api
etcdserver: extend auth api
2015-09-21 11:51:19 -07:00
999b2c6ec2 doc: add proxy promotion doc 2015-09-21 11:47:37 -07:00
6188933c81 Merge pull request #3556 from xiang90/better_error_logging
etcdmain: better logging when user forget to set initial flags
2015-09-21 10:52:34 -07:00
3b70bf87c3 etcdmain: better logging when user forget to set initial flags 2015-09-21 10:43:26 -07:00
574d1b0d46 Merge pull request #3563 from dnaeon/fixes
Fix etcd/client API example
2015-09-21 10:06:41 -07:00
d6459b8b84 client: Fix API example 2015-09-21 19:51:29 +03:00
6ae1f6c6e4 etcdserver: extend auth api
allow recursive query on users and roles to get more detail

Fixes #3278
2015-09-21 00:51:18 -07:00
f3d2b5831c Merge pull request #3558 from yichengq/watch
storage: add tests for RangeEvents and its underlying functions
2015-09-20 23:58:41 -07:00
cbddb8670a Merge pull request #3561 from ceh/raft-doc-typo
raft: fix Node doc typo
2015-09-20 21:52:34 -07:00
b9f22cb69b raft: fix Node doc typo 2015-09-21 06:13:33 +02:00
d72914c36f storage: clarify comment for store.RangeEvents and fix related bugs
Change to the function:
1. specify the meaning of startRev and endRev parameters
2. specify the meaning of returned nextRev

Moreover, it adds unit tests for the function.
2015-09-19 23:17:03 -07:00
5709b66dfb storage: add unit test for index.RangeEvents 2015-09-19 23:08:24 -07:00
87b5143b15 storage: fix missing continue in keyIndex.since
It should continue to skip following operations.

The test from rev14 to rev0 fails if it doesn't call continue and append
all revisions of the same main rev to the list.
2015-09-19 23:01:18 -07:00
158d6e0e03 storage: fix calculating generation in keyIndex.since
It should skip last empty generation when the key is just tombstoned.

The rev15 and rev16 in the test fails if it doesn't skip last empty generation
and find previous generations.
2015-09-19 22:58:45 -07:00
06180be154 Merge pull request #3533 from xiang90/proxy
proxy: expose proxy configuration
2015-09-18 14:18:06 -07:00
ac29432aab proxy: add a test for configHandler 2015-09-18 13:43:54 -07:00
0f9b2046ef Merge pull request #3547 from bdarnell/multinode-node-ids
raft: Allow per-group nodeIDs in MultiNode.
2015-09-18 13:29:07 -07:00
b7baaa6bc8 raft: Allow per-group nodeIDs in MultiNode.
This feature is motivated by
https://github.com/cockroachdb/cockroach/blob/master/docs/RFCS/replica_tombstone.md
which requires a change to the way CockroachDB constructs its node IDs.
2015-09-18 15:36:36 -04:00
be80d11948 storage: enhance test for keyIndex.Get and keyIndex.Compact
It covers the case that one key is set multiple times in one main
revision now.
2015-09-17 18:26:17 -07:00
cedad49dcf Merge pull request #3543 from mitake/reconfig-remove
etcdserver: forbid removing started member if quorum cannot be preserved in strict reconfig mode
2015-09-17 18:22:53 -07:00
f8859a980d etcdserver: forbid removing started member if quorum cannot be preserved in strict reconfig mode
Like the commit 6974fc63ed, this commit lets etcdserver forbid
removing started member if quorum cannot be preserved after
reconfiguration if the option -strict-reconfig-check is passed to
etcd. The removal can cause deadlock if unstarted members have wrong
peer URLs.
2015-09-18 10:09:57 +09:00
c4b3ad72d9 Merge pull request #3544 from xiang90/bench
v3benchmark: add put benchmark
2015-09-17 15:10:13 -07:00
f69582e1a2 v3benchmark: add put benchmark 2015-09-17 14:48:07 -07:00
97b67fdbfc Merge pull request #3548 from yichengq/travis
storage/backend: extend wait timeout for commit to finish
2015-09-16 14:22:46 -07:00
f7efbe8b14 storage/backend: extend wait timeout for commit to finish
It needs to take more time on travis. Fix:

```
--- FAIL: TestBackendBatchIntervalCommit (0.01s)
		backend_test.go:113: bucket test does not exit
```
2015-09-16 14:14:51 -07:00
ec4142576e Merge pull request #3534 from xiang90/grpc_err
etcdserver: better v3 api error handling
2015-09-16 12:32:28 -07:00
804d80387d Merge pull request #3546 from gae123/patch-1
doc: update admin_guide.md for the recent Go1.5 default MAXPROCS change
2015-09-16 10:44:38 -07:00
7fb0eb8f56 doc: update admin_guide.md for the recent Go1.5 default MAXPROCS change 2015-09-16 10:43:40 -07:00
ce47161ae0 Merge pull request #3540 from xiang90/bench
Godep: add cheggaaa dependency
2015-09-15 16:27:07 -07:00
4628d08879 Godep: add cheggaaa dependency 2015-09-15 16:24:31 -07:00
3a2700141e Merge pull request #3539 from xiang90/bench
godep: use github.com/cheggaaa/pb
2015-09-15 16:12:34 -07:00
38dd680f2e godep: use github.com/cheggaaa/pb 2015-09-15 16:08:07 -07:00
8bb50635ce Merge pull request #3538 from xiang90/bench
benchmarkv3: refactoring the main logic
2015-09-15 15:58:32 -07:00
4deb12fbbb benchmarkv3: refactoring the main logic 2015-09-15 15:57:38 -07:00
3221b6787e Merge pull request #3537 from jonboulle/master
*: add missing license headers + test
2015-09-15 14:19:17 -07:00
108f97d63e test: add license header check 2015-09-15 14:09:01 -07:00
7848ac3979 *: add missing license headers 2015-09-15 14:09:01 -07:00
867954f3ad Merge pull request #3535 from xiang90/rev
storage: add rev into kv interface
2015-09-15 12:54:29 -07:00
6d1f0ce89f storage: add rev into kv interface 2015-09-15 12:11:00 -07:00
94f4069a25 etcdserver: better v3 api error handling 2015-09-15 11:20:06 -07:00
e079f87410 proxy: expose proxy configuration 2015-09-15 10:27:51 -07:00
c082488e23 Merge pull request #3507 from yichengq/watch
storage: support basic watch
2015-09-15 00:04:36 -07:00
ec43e0a4c3 storage: introduce WatchableKV and watch feature
WatchableKV is an interface upon KV, and supports watch feature.
2015-09-14 23:53:03 -07:00
34bfac99c4 Merge pull request #3529 from yichengq/snapshot
etcdserver: rename db file into a formal directory
2015-09-14 23:43:17 -07:00
352cd768c6 etcdserver: fix shadow declaration 2015-09-14 23:25:16 -07:00
05c74bd890 etcdserver: rename db file into a formal directory
and rename it to a formal name
2015-09-14 22:41:40 -07:00
51f1ee055e Merge pull request #3526 from yichengq/snapshot
etcdserver: forbid to unset v3 demo once used
2015-09-14 21:36:39 -07:00
1f0fb3d9aa etcdserver: forbid to unset v3 demo once used
After enabling v3 demo, it may change the underlying data organization
for v3 store. So we forbid to unset --experimental-v3demo once it has
been used.
2015-09-14 21:27:11 -07:00
b1c2d7e526 Merge pull request #3528 from xiang90/compact
*: support v3 compaction
2015-09-14 20:04:50 -07:00
94f784826a *: support v3 compaction 2015-09-14 19:59:36 -07:00
e0d8923f7b Merge pull request #3524 from xiang90/grpc_error
etcdserver: use gRPC error instead of error message in header
2015-09-14 16:38:44 -07:00
7183387110 etcdserver: use gRPC error instead of error message in header 2015-09-14 16:11:13 -07:00
d04382c30e Merge pull request #3525 from gyuho/master
etcdserver, store: fix grammars in comments (a->an existing)
2015-09-14 13:47:49 -07:00
c2dcf7431e etcdserver, store: fix grammars in comments (a->an existing)
I found some grammatical errors in comments.

This pull request was submitted https://github.com/coreos/etcd/pull/3513.
I am resubmitting following the correct guidlines.
2015-09-14 13:41:13 -07:00
1fc122741d Merge pull request #3521 from raoofm/patch-3
doc: faq.md change flag --peers to --endpoint
2015-09-14 09:43:41 -07:00
c7b4c67436 Merge pull request #3514 from xiang90/v3_raft
support clustered v3 api
2015-09-14 09:35:02 -07:00
d685135832 doc: faq.md change flag --peers to --endpoint
doc: faq.md change flag --peers to --endpoint

Changing the flag to --endpoint and mentioning that --peers is deprecated.
2015-09-14 12:22:06 -04:00
451cce4a90 Merge pull request #3516 from xiang90/hash_improved
storage: support hash state
2015-09-13 21:46:12 -07:00
714b5e0b08 storage: support hash state 2015-09-13 21:34:58 -07:00
cdaa263346 Merge pull request #3506 from philips/improve-tocommit-error
raft: improve panic error message
2015-09-13 17:46:53 -07:00
f8fd2c10d6 Merge pull request #3449 from yichengq/cleanup-max-election
Documentation/tuning: cleanup paragraph on max election
2015-09-13 16:55:16 -07:00
95bb6d7584 Merge pull request #3508 from amarshall/patch-3
readme: Use SVG image for build status badge
2015-09-13 15:21:58 -07:00
40e0a33fcd Merge pull request #3511 from xiang90/v3_raft
Procfile: add a v3DemoProcfile
2015-09-13 08:43:14 -07:00
4c81615cef etcdserver: initial support for cluster-wide v3 request 2015-09-13 08:32:01 -07:00
600456f4ba etcdserverpb: update proto file for raftInternalRequest
We needs to assign each raftInternalRequest an ID for getting
the response after it goes through raft.

We also needs an empty response for error case.
2015-09-13 08:28:10 -07:00
ac7253f28e Procfile: add a v3DemoProcfile 2015-09-12 23:08:56 -07:00
662b4966d0 Merge pull request #3510 from xiang90/v3_raft
etcdmain: support gRPC addr flag
2015-09-12 22:58:08 -07:00
a0cfcf2dd7 etcdmain: support gRPC addr flag 2015-09-12 22:52:51 -07:00
35f1531576 Merge pull request #3509 from xiang90/v3_raft
etcdctlv3: support endpoint flag
2015-09-12 22:51:36 -07:00
121d2b9e9d etcdctlv3: support endpoint flag 2015-09-12 22:46:43 -07:00
0894294074 readme: Use SVG image for build status badge
More accessible, better scaling.
2015-09-13 01:12:11 -04:00
0ca800fbac Merge pull request #3479 from mitake/membership
etcdserver: avoid deadlock caused by adding members with wrong peer URLs
2015-09-12 22:09:13 -07:00
dad32646eb etcdserver: enhance test cases for isReadyToAddNewMember
- a case of a cluster with even number members
 - a case of an empty cluster
2015-09-13 12:30:10 +09:00
d9cf752060 etcdserver: add test for isReadyToAddNewMember
Also fixed check for special case of one-member cluster
2015-09-13 11:16:08 +09:00
6974fc63ed etcdserver: avoid deadlock caused by adding members with wrong peer URLs
Current membership changing functionality of etcd seems to have a
problem which can cause deadlock.

How to produce:
 1. construct N node cluster
 2. add N new nodes with etcdctl member add, without starting the new members

What happens:
After finishing add N nodes, a total number of the cluster becomes 2 *
N and a quorum number of the cluster becomes N + 1. It means
membership change requires at least N + 1 nodes because Raft treats
membership information in its log like other ordinal log append
requests.

Assume the peer URLs of the added nodes are wrong because of miss
operation or bugs in wrapping program which launch etcd. In such a
case, both of adding and removing members are impossible because the
quorum isn't preserved. Of course ordinal requests cannot be
served. The cluster would seem to be deadlock.

Of course, the best practice of adding new nodes is adding one node
and let the node start one by one. However, the effect of this problem
is so serious. I think preventing the problem forcibly would be
valuable.

Solution:
This patch lets etcd forbid adding a new node if the operation changes
quorum and the number of changed quorum is larger than a number of
running nodes. If etcd is launched with a newly added option
-strict-reconfig-check, the checking logic is activated. If the option
isn't passed, default behavior of reconfig is kept.

Fixes https://github.com/coreos/etcd/issues/3477
2015-09-13 09:31:53 +09:00
68d4ec3e13 raft: improve panic error message
Give a human being some insight into how we might have gotten to this
state based on feedback from #3504.
2015-09-12 12:17:02 -07:00
d4e19d1afb Merge pull request #3501 from yichengq/update-peers
docs/admin_guide: use ETCDCTL_ENDPOINT
2015-09-12 08:31:47 -07:00
e9512f8c5f docs/admin_guide: use ETCDCTL_ENDPOINT
because ETCDCTL_PEERS is not prefered.
2015-09-11 19:38:55 -07:00
28a371471a Merge pull request #3500 from yichengq/fix-ETCD
libraries-and-tools.md: correct project name to etcd
2015-09-11 19:34:15 -07:00
4e71954111 libraries-and-tools.md: correct project name to etcd
etcd is the official name of the project.
2015-09-11 19:31:40 -07:00
a528cb6f5d Merge pull request #3495 from rekby/patch-2
libraries-and-tools.md: add etcddir
2015-09-11 19:30:03 -07:00
f8f702b3f8 Merge pull request #3497 from jonboulle/master
docs: add official client to libraries-and-tools
2015-09-11 15:17:30 -07:00
fd82f0b8d5 docs: add official client to libraries-and-tools 2015-09-11 15:16:02 -07:00
136efd3ba9 libraries-and-tools.md: add etcddir 2015-09-11 21:04:56 +03:00
c5c3ae4790 Merge pull request #3486 from yichengq/readme
README: warn that master branch is unstable
2015-09-10 19:20:35 -07:00
56d61d995a Merge pull request #3487 from onlyjob/master
Minor spelling corrections (codespell).
2015-09-10 17:46:29 -07:00
b2f4a5f587 *: fix spelling issues (codespell).
Signed-off-by: Dmitry Smirnov <onlyjob@member.fsf.org>
2015-09-11 10:22:29 +10:00
bd5f924b0f README: warn that master branch is unstable
Avoid users building from master branch for stable binaries.
2015-09-10 14:27:10 -07:00
1de63deca4 Merge pull request #3483 from xiang90/update_roadmap
roadmap.md: update roadmap for 2.3
2015-09-10 13:32:30 -07:00
f51c2f471b Merge pull request #3482 from yichengq/client
client: add Nodes type to faciliate sorting
2015-09-10 12:29:12 -07:00
07bd9f65d3 roadmap.md: update roadmap for 2.3 2015-09-10 12:24:48 -07:00
2f558e56d2 client: add Nodes to codecgen and regenerate 2015-09-10 11:51:59 -07:00
eb51901830 client: add Nodes type to faciliate sorting
This helps users to sort easily.
2015-09-10 11:03:12 -07:00
db0511e28c *: bump to v2.2.0+git 2015-09-10 10:03:07 -07:00
e4561dd8cf *: bump to v2.2.0 2015-09-10 10:02:45 -07:00
6e7725cd51 Merge pull request #3478 from endocode/kayrus/typo_fix
doc: member id typo fixed
2015-09-10 00:11:26 -07:00
37392ad223 doc: member id typo fixed 2015-09-10 08:47:45 +02:00
9b032c6a00 Merge pull request #3473 from MrLawes/master
doc: fix bad url in using a directory TTL section
2015-09-09 18:57:09 -07:00
1c058e9706 doc: fix bad url in using a directory TTL section 2015-09-10 09:23:10 +08:00
f3085d2ea4 Merge pull request #3459 from yichengq/release-doc
docs/dev: add release doc
2015-09-09 17:46:10 -07:00
b70e6fc677 docs/dev: add release doc
It documents the standard way to release etcd today. Maintainer should
follow this doc to cut release, and update it in time to fit current
situation.
2015-09-09 16:42:31 -07:00
c34cf04c27 Merge pull request #3448 from yichengq/release-script
scripts: add release.sh
2015-09-09 13:54:15 -07:00
bdd8774169 Merge pull request #3204 from endocode/kayrus/recovery
Improved "disaster restore" doc, added "member update" command descri…
2015-09-09 12:23:51 -07:00
19ad634673 doc: improved "disaster restore" doc, added "member update" command description 2015-09-09 20:07:31 +02:00
7d4cd7c76a scripts: add release.sh
It could build all binaries and images for the given version.
2015-09-09 09:50:41 -07:00
af0474f2e3 Merge pull request #3465 from raoofm/patch-1
etcdmain: Proxy doesnt specify - listening on http or https
2015-09-08 14:38:55 -07:00
2de1c36061 etcdmain: Proxy doesnt specify - listening on http or https
etcdmain: Proxy doesnt specify - listening on http or https

Fixes #3464
2015-09-08 17:19:23 -04:00
ccdd10c757 Merge pull request #3463 from yichengq/update-roadmap
roadmap: remove 2.2 section
2015-09-08 13:55:50 -07:00
c837f0526f roadmap: remove 2.2 section
We have finished all of them.
2015-09-08 13:43:39 -07:00
d8e6e217fd Merge pull request #3461 from xiang90/doc
doc: remove one limitation in upgrade doc
2015-09-08 13:29:43 -07:00
3689ea3071 doc: remove one limitation in upgrade doc 2015-09-08 13:28:23 -07:00
a44da0b62a Merge pull request #3451 from raoofm/patch-1
discovery: log error only if both ssl and non-ssl srv lookups fail
2015-09-06 20:54:43 -07:00
9a2809f0b5 discovery: log error only if both ssl and non-ssl srv lookups fail
discovery: log error only if both ssl and non-ssl srv lookups fail
Earlier we were logging as soon as one of the lookups failed.

Fixes #3414
2015-09-06 23:44:19 -04:00
48a4d2ccba Documentation/tuning: cleanup paragraph on max election
- Use one sentence per line for easier diffing
- Walkthrough the thought process and cleanup the grammar
- Move below the other sections

Original author: @philips
2015-09-06 00:38:03 -07:00
184337568d scripts/build-docker: build docker in image-docker dir
The docker build command will use whatever directory contains the
Dockerfile as the build context (including all of its subdirectories).
And the <src> path of ADD must be inside the context of the build.
So change it to build in a specific directory for clean and fast.
2015-09-06 00:17:41 -07:00
15d1db9bf8 scripts/build-aci: support BINARYDIR and BUILDDIR
This makes it more configurable, and is ready for overall release script.
2015-09-06 00:17:41 -07:00
6b70fa72fe scripts: build-release -> build-binary
This makes the functionality of the script more clear, and always use
bash to run the script because it has bash-specific grammar.
2015-09-06 00:16:51 -07:00
cf6cb82caa scripts/build-docker: stop creating scratch image
Scratch image has become docker's reserved image.
2015-09-06 00:16:08 -07:00
a1b01c266a scripts/build-aci: fix the way to check executability
Or it may treat runnable command as unexecutable.
2015-09-06 00:15:31 -07:00
b9646b5734 Merge pull request #3447 from xiang90/txn
etcdctlv3: fix txn command
2015-09-05 18:21:11 -07:00
1532f7585b etcdctlv3: fix txn command 2015-09-05 16:08:15 -07:00
dab0871acb Merge pull request #3446 from xiang90/v3
etcdserver: refactor v3demo do
2015-09-05 15:41:00 -07:00
95d5556445 etcdserver: refactor v3demo do 2015-09-05 15:31:28 -07:00
d5ab71a4e8 Merge pull request #3445 from xiang90/api_doc
doc: add monitoring section to admin doc
2015-09-05 08:27:11 -07:00
13b3c64c10 doc: add monitoring section to admin doc 2015-09-05 08:25:35 -07:00
51d0630a8e Merge pull request #3440 from yichengq/memory-bench
docs/benchmark: add 2.2.0-rc memory usage benchmark
2015-09-04 20:23:56 -07:00
91b5b247e9 docs/benchmark: add 2.2.0-rc memory usage benchmark
It records the memory usage for different average value size, and
records the data size limitation.
2015-09-04 18:27:49 -07:00
106d918dd5 Merge pull request #3444 from xiang90/doc
etcdctl: suggest endpoint over peer
2015-09-04 13:22:03 -07:00
322aab133d etcdctl: suggest endpoint over peer 2015-09-04 13:16:33 -07:00
9fa05ad8a0 Merge pull request #3443 from xiang90/test
test: now raft has no shadow issue
2015-09-04 11:31:44 -07:00
39580479b5 Merge pull request #3442 from xiang90/b
etcdctl: prepare for health endpoint change
2015-09-04 11:30:44 -07:00
a6e67a6dec test: now raft has no shadow issue
We can test raft pkg now!
2015-09-04 10:52:14 -07:00
778f8d8fea Merge pull request #3434 from xiang90/index_revision
*: v3api index->revision
2015-09-04 10:48:59 -07:00
3f18ded10a *: v3api index->revision 2015-09-04 10:41:20 -07:00
5a5f15de39 Merge pull request #3438 from yichengq/storage-test
storage: add mock tests for store struct
2015-09-04 10:26:08 -07:00
04539c6240 etcdctl: prepare for health endpoint change
We made a mistake on the health endpoint by returning a string "true".
We have to make the etcdctl works for the next version of etcd which
will correct the mistake on the server side.

It is too late to change the server side right now since we already
released a version of etcdctl that only understands "true".
2015-09-04 10:20:24 -07:00
215f27c2f5 storage: add mock tests for store struct 2015-09-04 08:53:49 -07:00
8ca76a789b Merge pull request #3439 from akolb1/godep_all_fixes
Godep: fixed missing dependencies
2015-09-03 22:20:48 -07:00
2782418923 Godep: fixed missing dependencies 2015-09-04 04:51:44 +00:00
5ae2eb4731 storage: avoid one extra round of wait
It could exit early if it knows that there is no more keys.
2015-09-03 19:12:27 -07:00
9175df7c71 storage: correct revision for range when deleteRange
to make it logically reasonable.
2015-09-03 19:12:27 -07:00
797a4796d9 storage: remove check for DELETE type KeyValue
kvindex always returns kvs that exist at given revision, so there is no
need to check for whether the KeyValue range from backend is DELETE type.
2015-09-03 19:12:27 -07:00
00e31f13a6 storage: remove unnecessary rev parameter 2015-09-03 19:12:27 -07:00
2f2b084ab5 Merge pull request #3436 from xiang90/remove_consistent_token
*: replace consistent token with revision in v3 api
2015-09-03 17:16:07 -07:00
254d641ff9 Merge pull request #3429 from xiang90/upgrade_doc
doc: add upgrade to 2.2 doc
2015-09-03 15:47:10 -07:00
2ac9af4924 *: replace consistent token with revision in v3 api 2015-09-03 15:41:33 -07:00
243fe519a9 Merge pull request #3435 from xiang90/gogoproto
*: update gogoproto
2015-09-03 15:35:48 -07:00
ef7cf058a2 *: update gogoproto 2015-09-03 15:32:25 -07:00
356aba7595 doc: add upgrade to 2.2 doc 2015-09-03 11:48:30 -07:00
ae2b43b588 Merge pull request #3433 from tamird/proto-import-path
*: regenerate proto to use local import path
2015-09-03 10:52:37 -07:00
45390b9fb8 *: regenerate proto to use local import path
Using Go-style import paths in protos is not idiomatic. Normally, this
detail would be internal to etcd, but the path from which gogoproto
is imported affects downstream consumers (e.g. cockroachdb).

In cockroach, we want to avoid including `$GOPATH/src` in our protoc
include path for various reasons. This patch puts etcd on the same
convention, which allows this for cockroach.

More information: https://github.com/cockroachdb/cockroach/pull/2339#discussion_r38663417

This commit also regenerates all the protos, which seem to have
drifted a tiny bit.
2015-09-03 13:38:28 -04:00
84d1527df6 Merge pull request #3432 from coreos/robszumski-patch-1
docs: insert whitespace
2015-09-03 09:56:57 -07:00
49e7e6eb9f docs: insert whitespace
Fixes the rendering of this page on https://coreos.com/etcd/docs/2.1.0/proxy.html
2015-09-03 09:50:07 -07:00
1eaf169057 Merge pull request #3395 from yichengq/backend-test
storage/backend: add unit tests for backend and batchTx
2015-09-03 07:23:38 -07:00
44fd734038 storage/backend: add unit tests for backend and batchTx 2015-09-02 16:57:13 -07:00
16e9e4b3d5 Merge pull request #3412 from yichengq/etcdctl-sync
etcdctl: better logging for sync process
2015-09-02 16:49:00 -07:00
8e040efed9 etcdctl: log more about sync process
Users don't even know that etcdctl is doing sync and fails on sync
process. So we add more logs for sync process.
2015-09-02 16:10:25 -07:00
3a8db488ca Merge pull request #3415 from yichengq/better-err
etcdctl/command: print more details about ErrNoEndpoint
2015-09-02 10:11:45 -07:00
41cc16481f Merge pull request #3418 from AdoHe/fix_build_script_error
build: fix build error on ubuntu
2015-09-01 22:44:23 -07:00
9665cda7c1 build: fix build error on ubuntu 2015-09-02 13:28:55 +08:00
484a115813 Merge pull request #3424 from akolb1/bolt_solaris1
Godeps: boltdb dependency missing solaris support
2015-09-01 16:19:23 -07:00
ecbc44fb63 Godeps: boltdb dependency missing solaris support 2015-09-01 23:17:36 +00:00
423e3bbbd8 etcdctl/cluster_health: provide better message for empty client urls
It skips sync when init client, and prints out unreachable messagen and
points to notice when checking health of etcd members one by one.
2015-09-01 14:42:19 -07:00
aa0c8fea55 Merge pull request #3321 from yichengq/doc-tls-setup
docs/security: link cfssl example
2015-09-01 14:28:40 -07:00
6caae58814 docs/security: recommend cfssl instead of etcd-ca
This provides a more general and stable way for users to set TLS cluster.
2015-09-01 14:07:26 -07:00
d412eaa3a2 Merge pull request #3308 from yichengq/go-codec
Use ugorji codec for unmarshalling key responses in client
2015-09-01 14:04:38 -07:00
53b8175d3f Merge pull request #3421 from xiang90/3411
etcdmain: proxy does not need to belong to the discovered cluster
2015-09-01 13:49:31 -07:00
7957677cf2 etcdmain: proxy does not need to belong to the discovered cluster 2015-09-01 11:24:02 -07:00
a94118893c Merge pull request #3413 from xiang90/snapshot_dir
*: support wal dir
2015-09-01 10:03:50 -07:00
d94e712d91 *: support wal dir 2015-09-01 09:54:27 -07:00
85b6c51a23 Merge pull request #3420 from yichengq/wait-more
storage: extend timeout to wait for put complete
2015-09-01 09:25:46 -07:00
a21166c3aa storage: extend timeout to wait for put complete
travis is sometimes slow, and it could fail to complete the put in 10ms.
2015-09-01 09:03:03 -07:00
8ac981e1ee Merge pull request #3416 from yichengq/get-cluster-timeout
etcdserver: add timeout param on getClusterFromRemotePeers
2015-09-01 09:00:19 -07:00
f3bfcb9dee etcdserver: add timeout param on getClusterFromRemotePeers
It sets 10s timeout for public GetClusterFromRemotePeers.

This helps the following cases to work well in high latency scenario:

1. proxy sync members from the cluster
2. newly-joined member sync members from the cluster

Besides 10s request timeout, the request is also controlled by dial
timeout and read connection timeout.
2015-09-01 08:49:01 -07:00
1fabc48968 Merge pull request #3404 from bdarnell/multinode-propose-panic
raft: A removed node can no longer be leader.
2015-08-31 20:06:34 -07:00
4f20e01f60 raft: Ignore proposals if not a current member.
Fixes another panic in MultiNode.Propose.
2015-08-31 20:31:14 -04:00
c2caa4ae3b etcdctl/command: print more details about ErrNoEndpoint
This commit prints more details if getting ErrNoEndpoint when sync with
cluster. This helps users to know what happens.
2015-08-31 16:28:43 -07:00
4b9b0cbcc1 storage: add newBackend and newBatchTx
This is for ease of testing.
2015-08-31 13:25:10 -07:00
57b39aca4e Merge pull request #3403 from xiang90/doc
doc: add 0.4.9 to 2.2 migration guide
2015-08-31 11:28:25 -07:00
3c1f80bdff Merge pull request #3401 from xiang90/more_metrics
more on storage metrics
2015-08-31 09:55:29 -07:00
406bb6749e doc: add 0.4.9 to 2.2 migration guide 2015-08-31 09:55:12 -07:00
bc71aab07a Merge pull request #3409 from xiang90/fix_force_new
etcdserver: ignore confChangeUpdateNode in getIDs
2015-08-31 09:44:10 -07:00
1bcaa9f4a1 etcdserver: ignore confChangeUpdateNode in getIDs 2015-08-31 09:36:39 -07:00
aaa7dfc14d Merge pull request #3407 from MSamman/fix-build-warning
build: fixed build warning
2015-08-31 07:47:23 -07:00
dd4317db43 build: fixed build warning
to clear warning and ensure git sha linkage works in the future

Fixes #3406
2015-08-30 15:05:56 -07:00
b9632e0f8d storage: register txnCounter 2015-08-28 15:17:16 -07:00
dd443be41b storage: report total number of keys 2015-08-28 15:16:53 -07:00
d2cb732c7b test: activate test on storage/backend 2015-08-28 13:52:31 -07:00
054fab84ee storage/backend: remove startc var
This makes start logic cleaner.
2015-08-28 13:52:31 -07:00
fca98c9071 Merge pull request #3398 from xiang90/storage_metrics
storage: add initial metrics for kv
2015-08-28 13:50:44 -07:00
b5838edb93 storage: add initial metrics for kv 2015-08-28 13:41:42 -07:00
6cbaaa715c Merge pull request #3396 from bdarnell/multinode-propose-panic
raft: Fix a nil-pointer panic in MultiNode.Propose.
2015-08-28 12:34:49 -07:00
cba7c6a180 *: bump to v2.2.0-rc.0+git 2015-08-28 10:26:56 -07:00
dc3e027288 *: bump to v2.2.0-rc.0 2015-08-28 10:26:32 -07:00
b40e077047 Merge pull request #3388 from sckott/docfix-tuning
fix docs, change tuning link in api.md from section to file
2015-08-28 09:23:58 -07:00
05924b330a raft: Fix a nil-pointer panic in MultiNode.Propose. 2015-08-28 11:17:59 +02:00
f04884f74d storage/backend: fix off-by-one error for pending var
Or it may commit until batchLimit + 1.
2015-08-27 22:51:32 -07:00
7ed929fb3d storage/backend: fix limit doesn't effect in range 2015-08-27 22:51:32 -07:00
37d9354aa2 Merge pull request #3394 from yichengq/bench-2.2
adjust file and README in docs/benchmark
2015-08-27 21:09:39 -07:00
9d78d84270 Merge pull request #3390 from xiang90/ctl_peer
etcdctl: suggest endpoint over peers flag
2015-08-27 21:03:39 -07:00
8d8033df55 etcdctl: suggest endpoint over peers flag 2015-08-27 18:52:17 -07:00
753a079700 docs/benchmark: add benchmark result links in README 2015-08-27 17:08:49 -07:00
425afa66ea docs/benchmarks: update bench version for more accuracy 2015-08-27 17:08:30 -07:00
f68e4a1a5d Merge pull request #3392 from yichengq/bench-2.2
docs/benchmark: update etcd 2.2 bench
2015-08-27 16:58:04 -07:00
605f0ce730 docs/benchmark: update etcd 2.2 bench
This benchmark is for etcd 2.2 rc after fixing several performance
downgrade bugs.
2015-08-27 16:52:55 -07:00
b0192118dd doc: change tuning link in api.md from section to file 2015-08-27 15:04:07 -07:00
1124a06860 Merge pull request #3387 from yichengq/fix-quorum
doc: correct calculation of fault tolerance of an etcd cluster in adm…
2015-08-27 14:48:39 -07:00
bc2b8856d7 doc: correct calculation of fault tolerance of an etcd cluster in admin_guide.md
doc: correct calculation of fault tolerance of an etcd cluster in admin_guide.md
2015-08-27 14:30:12 -07:00
df83af944b Merge pull request #3384 from yichengq/fix-shadow
test: use go vet shadow feature instead of go-nyet
2015-08-27 14:27:57 -07:00
92cd24d5bd *: fix govet shadow check failure 2015-08-27 14:15:30 -07:00
b2d33e6dcb Merge pull request #3382 from xiang90/env
pkg/flags: print out evn usage information
2015-08-27 13:36:55 -07:00
ccdb850e1e test: use go vet shadow feature instead of go-nyet
Use official support instead of home-made one.
2015-08-27 13:29:12 -07:00
4ac4648b5b Merge pull request #3383 from cognusion/fixes2
Test Fixes: Take 2
2015-08-27 13:22:19 -07:00
327632014e cors: Removed new(?) header from test, resolving failure
"X-Content-Type-Options" was being autoadded, but none of the
test maps took it into account. I saw that "Content-Type" was
also being deleted, so I figured that was the best solution
for this as well.
2015-08-27 15:23:14 -04:00
19a28c8efd storage: Fixed backend test
./backend_test.go:23: multiple-value batchTx.UnsafeRange() in single-value context
2015-08-27 15:20:29 -04:00
32372e1d70 raft: Fixed a test misassumption
network_test.go:56: total = 59.22354ms, want > 50ms
59 is > 50, but the equation added 10 to the right side
2015-08-27 15:15:34 -04:00
c8f5e03b75 pkg/flags: print out evn usage information 2015-08-27 12:08:31 -07:00
25c87f13fd Merge pull request #3354 from mx2323/faq
add faq documentation
2015-08-26 16:36:04 -07:00
8f3ea5ebed doc: add faq documentation 2015-08-26 16:34:52 -07:00
59a5a7e309 Merge pull request #3368 from yichengq/storage-test
add unit tests for storage
2015-08-26 15:32:02 -07:00
0d38c13990 storage: use temp path to handle test file 2015-08-26 15:01:41 -07:00
2d01eb4e11 storage: add tests for kvstore_compaction 2015-08-26 15:01:13 -07:00
f38778160d Merge pull request #3376 from yichengq/connection-down
etcdserver: specify request timeout error due to connection down
2015-08-26 13:09:30 -07:00
0813139140 storage: add more tests for index 2015-08-26 12:53:30 -07:00
3723f01b48 storage: add more unit tests for keyIndex 2015-08-26 12:53:30 -07:00
ad8a291dc1 storage: return error when tombstone on new generation
It is not allowed to put tombstone on an empty generation.
2015-08-26 12:53:30 -07:00
ffa87f9678 storage: fix the comment in generation.walk 2015-08-26 12:53:30 -07:00
8f6bf029f8 etcdserver: specify request timeout error due to connection lost
It specifies request timeout error possibly caused by connection lost,
and print out better log for user to understand.

It handles two cases:
1. the leader cannot connect to majority of cluster.
2. the connection between follower and leader is down for a while,
and it losts proposals.

log format:
```
20:04:19 etcd3 | 2015-08-25 20:04:19.368126 E | etcdhttp: etcdserver:
request timed out, possibly due to connection lost
20:04:19 etcd3 | 2015-08-25 20:04:19.368227 E | etcdhttp: etcdserver:
request timed out, possibly due to connection lost
```
2015-08-26 12:38:37 -07:00
76db9747f8 Merge pull request #3377 from yichengq/tls-info-string
pkg/transport: print ClientCertAuth in TLSInfo.String()
2015-08-25 22:45:10 -07:00
45bb88069b Merge pull request #3378 from yichengq/set-late
etcdmain: check error before assigning peer transport
2015-08-25 22:38:36 -07:00
58455a2ae4 etcdmain: check error before assigning peer transport
Or it may panic when new transport fails, e.g., TLS info is invalid.
2015-08-25 22:04:26 -07:00
57e88465bf pkg/transport: print ClientCertAuth in TLSInfo.String()
It is good to print it in debug output:

```
21:56:12 etcd1 | 2015-08-25 21:56:12.162406 I | etcdmain: peerTLS: cert
= certs/etcd1.pem, key = certs/etcd1-key.pem, ca = , trusted-ca =
certs/ca.pem, client-cert-auth = true
```
2015-08-25 21:53:52 -07:00
6250fed8a8 Merge pull request #3096 from philips/tls-info-debug
pkg/transport: include debug output for trusted-ca
2015-08-25 20:08:19 -07:00
008f988f6b Merge pull request #3375 from xiang90/doc
doc: add evn variable name to configuration.md
2015-08-25 14:48:35 -07:00
2b58da1699 Merge pull request #3374 from yichengq/gomaxprocs
etcdmain: change default GOMAXPROCS when compiling in go1.5
2015-08-25 14:48:00 -07:00
35a0459cc8 doc: add evn variable name to configuration.md 2015-08-25 14:35:15 -07:00
32ab3f6931 Merge pull request #3372 from xiang90/doc
improve clustering.md doc
2015-08-25 14:04:30 -07:00
c30c85898e doc: add explanation for client urls 2015-08-25 13:46:27 -07:00
2ac9a329ab etcdmain: stop setting GOMAXPROCS explicitly
We always want to use GOMAXPROCS() as the way go parses it. When in go1.4, we
want to expose GOMAXPROCS value, so we set GOMAXPROCS explicitly as the
way go 1.4 does and print it out.

But it becomes a problem when go 1.5 changes the way to set GOMAXPROCS.

Fix the problem by stop setting GOMAXPROCS and get its value directly.

Due to this change, it sets default GOMAXPROCS to the
number of CPUs available when compiling in go 1.5, which matches how go 1.5 works:
https://docs.google.com/document/d/1At2Ls5_fhJQ59kDK2DFVhFu3g5mATSXqqV5QrxinasI/edit

This is a behavior change in etcd 2.2.
2015-08-25 13:38:16 -07:00
a4285ef5c9 Merge pull request #3367 from MSamman/master
etcdserver: handle malformed basic auth
2015-08-25 13:12:48 -07:00
e2e002f94e etcdserver: handle malformed basic auth
return insufficient credentials if basic auth header is malformed

Fixes #3280
2015-08-25 12:37:24 -07:00
7bd558b2e0 Merge pull request #3373 from ecnahc515/add_report_bugs_contributing
Contributing: Link to reporting bugs doc
2015-08-25 12:17:06 -07:00
ad843341a9 Contributing: Link to reporting bugs doc 2015-08-25 12:15:03 -07:00
f56c5455f3 doc: mention reconfiguration design in clustering.md 2015-08-25 11:22:08 -07:00
986f354694 Merge pull request #3371 from xiang90/bolt
Godeps: update bolt dependency
2015-08-25 11:17:14 -07:00
e8f40b0412 storage/backend: add commitAndStop
After the upgrade of boltdb, db.Close waits for all txn to finish.
CommitAndStop commits the current txn and stop creating new ones.
2015-08-25 10:57:25 -07:00
8738a88fae Godeps: update bolt dependency 2015-08-25 10:39:29 -07:00
2d06f6b371 Merge pull request #3362 from yichengq/rafthttp-cancel
rafthttp: always cancel in-flight request when stop streamReader
2015-08-25 09:26:46 -07:00
61a75b3d48 rafthttp: always cancel in-flight request when pipeline.send
This fits the way for go1.5 to cancel request.
2015-08-25 09:07:49 -07:00
27b9963959 client: always cancel in-flight request when do request
This fits the way for go1.5 to cancel request.
2015-08-25 09:04:58 -07:00
ece39c9462 proxy: always cancel in-flight request
This fits the way for go1.5 to cancel request.
2015-08-25 08:59:59 -07:00
6fc638673c rafthttp: return err if stopped before setting cancel in dial()
The original workflow may fail to cancel if stop() cancels the finished
request just before dial() assigning a new cancel. This commit checks
streamReader status before setting cancel to avoid this problem.

It is tested at travis for 300 times. go 1.5 always works well, while
go 1.4 fails to stop once.
2015-08-25 08:59:12 -07:00
fc95ec0cc6 rafthttp: always cancel in-flight request when stop streamReader
This problem is totally fixed at 1.5.

go1.5 adds a Request.Cancel channel, which allows for "race free"
cancellation
(8b4278ffb7).
Our implementation relies on it to always cancel in-flight request.
2015-08-25 08:54:13 -07:00
0132b091d2 Merge pull request #3360 from yichengq/bench-3
*: add initial read benchmark for etcd v3
2015-08-25 07:58:30 -07:00
3632a1b9b1 *: add initial read benchmark for etcd v3
It includes the initial read benchmark for etcd v3.

This is the first step to give some rough thoughts. I haven't digged
deeper to answer some questions, including why its performance is not
better than HTTP + json, why one put will cause performance downgrade.
2015-08-25 07:50:18 -07:00
e3ef1d363a Merge pull request #3366 from xiang90/v3_proto
update v3 proto and doc
2015-08-24 11:22:29 -07:00
0cb45aee64 rfc: update v3 proto 2015-08-24 11:00:51 -07:00
1cccbb5ebd etcdserverpb: add comments for compaction 2015-08-24 10:52:54 -07:00
3a60d490d1 storagepb: fix comment location 2015-08-24 10:42:16 -07:00
4a5b94478e etcdserverpb: update comment for txn request 2015-08-24 10:40:05 -07:00
98ceb3cdbd etcdserverpb: add more field into rangeResponse 2015-08-24 10:33:20 -07:00
c7f10ed975 Merge pull request #3361 from yichengq/no-log
integration: only print critical log
2015-08-24 09:44:13 -07:00
3702be476b integration: only print critical log
This limits the logs printed out in integration test, so it will not
have log flood and help us read fatal log in travis.
2015-08-23 21:22:21 -07:00
514c4371a9 Merge pull request #3359 from yichengq/storage-test
functional tests for storage package and some related fixes
2015-08-23 21:12:36 -07:00
1e2b0acf6d test: activate test for storage package 2015-08-23 20:59:06 -07:00
9c0c314425 storage: add functional tests for the package
It adds and reorganize tests to construct functional tests.
2015-08-23 20:59:06 -07:00
9960651c3f storage: let range work in the process of txn
range should work in the process of txn to help check the status during the
txn.
2015-08-23 20:59:06 -07:00
6d97dcaf3f storage: ensure that desired compaction is persisted
It needs to persist the desired compaction, so it won't forget the compaction
if it crashes later.
2015-08-23 20:59:06 -07:00
353f10ca2b storage: reject to compact on future rev
Compaction on future rev is unreasonable.
2015-08-23 20:59:06 -07:00
47b243be5d storage: let TxnDeleteRange return rev if no error
If it doesn't return error, it should return valid rev.
2015-08-23 20:59:06 -07:00
62f7481b19 storage: keyIndex.get returns err when key is tombstoned
Before this commit, it will return wrong create index, mod index.

It lets findGeneration return error when rev is at the gap of two
generations. This leads to the change of compact() code.
2015-08-23 20:59:02 -07:00
3b2fa9f1de storage: fix TestKeyIndexCompact
It fails to pass before.
2015-08-23 17:22:49 -07:00
97b211c8ba Merge pull request #3357 from ccding/master
go vet
2015-08-22 10:29:29 -07:00
c09b667d57 *: fix go vet reported issues 2015-08-22 12:19:02 -05:00
044b23c3ca Merge pull request #3356 from xiang90/travis
*: test gofmt with -s and fix reported issues
2015-08-21 18:59:51 -07:00
6b23a8131f *: test gofmt with -s and fix reported issues 2015-08-21 18:52:16 -07:00
301b7f57c0 Merge pull request #3355 from yichengq/health-var
etcdctl/cluster_health: set health var when checked healthy
2015-08-21 15:37:15 -07:00
224755855d etcdctl/cluster_health: set health var when checked healthy
This was a typo.
2015-08-21 15:27:35 -07:00
84b614c508 Merge pull request #3342 from xiang90/travis
travis: test for go 1.5 build
2015-08-21 14:49:00 -07:00
1dcc145aef client: fix test 2015-08-21 14:36:29 -07:00
8c0610d4f5 Merge pull request #3352 from yichengq/fix-name-url
fix that etcd fails to start if using both IP and hostname when discovery srv
2015-08-21 12:38:38 -07:00
3c1e6b54b3 pkg/netutil: stop resolving in place
It helps to copy out a and b, and not modify the original a and b.
2015-08-21 12:09:17 -07:00
1c334979cd pkg/netutil: not introduce empty url when converting
It should not make slices with length and append elements at the same
time.
2015-08-21 12:08:17 -07:00
7b871aab41 pkg/netutil: not export resolve and urlsEqual functions
They are only used in this package, so there is no need to public them.
2015-08-21 11:58:37 -07:00
b1192e5c48 pkg/netutil: fix false negative comparison
Sort the resolved URLs before DeepEqual, so it will not compare URLs
that may be out of order due to resolution.
2015-08-21 10:15:08 -07:00
72462a72fb etcdserver: remove TODO to delete URLStringsEqual
Discovery SRV supports to compare IP addresses with domain names,
so we need URLStringsEqual function.
2015-08-21 09:52:17 -07:00
8ea3d157c5 Revert "Revert "Treat URLs have same IP address as same""
This reverts commit 3153e635d5.

Conflicts:
	etcdserver/config.go
2015-08-21 09:41:13 -07:00
07af0b3e5b Merge pull request #3346 from xiang90/auth_skip
etcdserver/auth: cache auth enable result
2015-08-20 23:32:29 -07:00
11a689d063 etcdserver/auth: cache auth enable result 2015-08-20 23:05:00 -07:00
e8e507b29b Merge pull request #3348 from xiang90/l
use limited listener from golang
2015-08-20 22:44:51 -07:00
ff37cc455c pkg/transport: remove home-grown limitedListener 2015-08-20 20:03:27 -07:00
92634356c1 *: use limitedListener from golang 2015-08-20 20:02:35 -07:00
da9a12b97c Merge pull request #3344 from xiang90/startup_version
etcdmain: print out version information on startup
2015-08-20 15:10:25 -07:00
6b77c146ec etcdmain: print out version information on startup 2015-08-20 14:50:16 -07:00
31395d257c travis: test for go 1.5 build 2015-08-20 11:39:41 -07:00
7cf9770e12 Merge pull request #3340 from xiang90/fix_perallocate
pkg/fileutil: treat not support error as nil error in preallocate
2015-08-20 11:38:03 -07:00
3ca5482251 pkg/fileutil: treat not support error as nil error in preallocate 2015-08-20 11:15:02 -07:00
4a6d6b0052 Merge pull request #3338 from spacejam/master
Reversion->Revision
2015-08-20 10:16:31 -07:00
acd7a92f03 storage: reversion -> revision 2015-08-20 08:39:07 -07:00
e1dfcec0ab Merge pull request #3327 from yichengq/bench-2.2
docs/benchmarks: add benchmark result for 2.2
2015-08-20 00:18:32 -07:00
807de81172 docs/benchmarks: add benchmark result for 2.2
And it analyzes the reason for performance changes.
2015-08-19 23:59:33 -07:00
795e962403 Merge pull request #3334 from mitake/snap-marsharing-prometheus
snap: export durations of marsharing cost during snapshot save
2015-08-19 20:59:04 -07:00
7a6d33620f snap: export durations of marshalling cost during snapshot save
Currently, total duration of snapshot saving is exported for
prometheus. For more detailed analysis, this commit let etcd export
durations of marshalling for prometheus.
2015-08-20 12:47:07 +09:00
46a2ae77a1 hack/benchmark: add script for benchmark
This is for etcd benchmark.
2015-08-19 20:37:27 -07:00
b0303e948c Merge pull request #3323 from xiang90/cl_health
etcdctl: use health endpoint to greatly simplify health checking
2015-08-19 17:15:52 -07:00
568d1c6783 etcdctl: use health endpoint to greatly simplify health checking 2015-08-19 11:47:08 -07:00
60387dc408 Merge pull request #3320 from yichengq/doc-rtt
docs: document how to set heartbeat interval and election timeout
2015-08-19 11:08:05 -07:00
28b61acd9e Merge pull request #3324 from xiang90/raft_logging
raft: downgrade the logging around snapshot to debugf
2015-08-18 17:18:08 -07:00
d01b6cd639 Merge pull request #3326 from elimisteve/master
client: fixed typo in WatcherOptions docs
2015-08-18 16:49:43 -07:00
952827157a client: fixed typo in WatcherOptions docs
specifices -> specifies
2015-08-18 16:43:09 -07:00
b3d2a621ab Merge pull request #3325 from elimisteve/master
client: spelling error in docs (occured -> occurred)
2015-08-18 16:35:13 -07:00
69fc796926 client: spelling error in docs (occured -> occurred) 2015-08-18 16:26:52 -07:00
50c1db3fbf raft: downgrade the logging around snapshot to debugf
Snapshot related logging is spamming when leader trying to
sync a failed peer.
2015-08-18 15:43:53 -07:00
7082d3a765 docs: document how to set heartbeat interval and election timeout
It gives more details about how to set heartbeat interval and election
timeout correctly based on RTT.
2015-08-18 13:54:44 -07:00
28cec1128d Merge pull request #3322 from philips/use-proxy-as-default-endpoint
Procfile: use proxy as default
2015-08-18 12:38:51 -07:00
087061e434 Merge pull request #3303 from yichengq/auth-path
use canonical path for auth
2015-08-18 12:06:48 -07:00
4778d780a8 pkg/pathutil: change copyright for path.go
The file only contains the function that is borrowed from std http lib,
so we use their copyright.
2015-08-18 11:48:22 -07:00
9106675fd4 Procfile: use proxy as default
I think it makes sense to make the proxy listen on the default port so
we can give the proxy more testing by default. Also, this should make it
easy to kill a single etcd member and test that etcdctl still works,
etc.

However, I have hit a bug: the proxy takes several seconds
2015-08-18 09:42:13 -07:00
fab3feab66 etcdctl/role: reject non-canonical permission path
Non-canonical permission path is useless because the path received
by auth is always canonical, which is due to our ServeMux always
redirects request to canonical path().

This helps users to detect path permission setting error early.

Ref: http://godoc.org/net/http#ServeMux
2015-08-18 08:59:53 -07:00
b5ec7f543a client: use canonical url path in request
The main change is that it keeps the trailing slash. This helps
auth feature to judge path permission accurately.
2015-08-18 08:59:48 -07:00
927d5f3d26 Merge pull request #3301 from yichengq/ca-file
etcdmain: update -ca-file description
2015-08-17 23:36:33 -07:00
c0747a7b8b etcdmain: update -ca-file description
so people could deprecate old flags and use new flags much easier.
2015-08-17 22:36:04 -07:00
bcb4d5d53e Merge pull request #3311 from yichengq/request-timeout
extend hardcoded timeout for globally-deployed etcd cluster
2015-08-17 17:00:24 -07:00
dfc6b4436f Merge pull request #3315 from xiang90/key_err
etcdhttp:write etcderror for all errors in keyhandler
2015-08-17 16:54:12 -07:00
ffae601af5 etcdmain: calculate dial timeout for peer transport
This helps peer communication in globally-deployed cluster.
2015-08-17 16:52:53 -07:00
1375ef8985 etcdserver: remove getVersion timeout
The request can still time out because we have set dial timeout and
read/write timeout. It increases timeout expectation from 1s to 5s,
but it makes it workable in globally-deployer cluster.
2015-08-17 16:50:40 -07:00
c7fbc01ef1 Merge pull request #3314 from sebschrader/proxy-loop
Warn about proxy loops with incorrect advertise-client-urls
2015-08-17 16:04:00 -07:00
d487cf6b63 etcdhttp:write etcderror for all errors in keyhandler 2015-08-17 15:51:29 -07:00
f70950ff93 docs: warn about proxy loops with incorrect advertise-client-urls 2015-08-18 00:42:48 +02:00
c530385d6d Merge pull request #3313 from yichengq/internal-timeout
etcdserver: use ReqTimeout only
2015-08-17 15:05:46 -07:00
af6d1d3d95 Merge pull request #3310 from xiang90/http_err
*: key handler should write auth error as etcd error
2015-08-17 14:57:19 -07:00
2d5b95c49f etcdserver: use ReqTimeout only
We cannot refer RTT value from heartbeat interval, so CommitTimeout
is invalid. Remove it and use ReqTimeout instead.
2015-08-17 14:54:25 -07:00
87f061bab2 *: key handler should write auth error as etcd error 2015-08-17 14:45:45 -07:00
ba3a9b5f92 Merge pull request #3309 from xiang90/enforce
etcdserver: add version enforcement when setting cluster version
2015-08-17 12:41:04 -07:00
15e03d801f etcdserver: add version enforcement when setting cluster version 2015-08-17 11:12:39 -07:00
f615f9a999 Merge pull request #3305 from xiang90/c_v
*: only print out major.minor version for cluster version
2015-08-17 09:40:01 -07:00
7083828ae3 Godeps: import github.com/ugorji/go/codec 2015-08-16 18:13:44 -07:00
a364af72af client: use ugorij/go/codec to unmarshal key response
This change speeds up response unmarshal ~2x:

```
BenchmarkSmallResponseUnmarshal	   20000	     75243 ns/op
BenchmarkManySmallResponseUnmarshal	     200	   6629661 ns/op
BenchmarkMediumResponseUnmarshal	    1000	   1359041 ns/op
BenchmarkLargeResponseUnmarshal	      20	  61600978 ns/op
```
2015-08-16 18:08:54 -07:00
95d100e957 client: add response unmarshal benchmark
The benchmark result:

```
BenchmarkSmallResponseUnmarshal	  10000	   164524 ns/op
BenchmarkManySmallResponseUnmarshal	    100	 13916636 ns/op
BenchmarkMediumResponseUnmarshal	   1000	  1974295 ns/op
BenchmarkLargeResponseUnmarshal	     20	 80462001 ns/op
ok		github.com/coreos/etcd/client	7.777s
```
2015-08-16 16:44:50 -07:00
d95c7d8a94 Merge pull request #3307 from ian-kelling/master
documentation: fix misspelled word
2015-08-15 18:53:58 -07:00
8dd44465c3 documentation: fix misspelled word 2015-08-15 17:56:17 -07:00
f199a484af *: only print out major.minor version for cluster version 2015-08-15 08:30:06 -07:00
bbcb38189c Merge pull request #3302 from xiang90/v
etcdserver: better version detection log output
2015-08-14 16:14:55 -07:00
0076ab154b etcdserver: better version detection log output
Fix https://github.com/coreos/etcd/issues/3288
2015-08-14 16:08:33 -07:00
dd56b7e05e Merge pull request #3299 from xiang90/txn
initial support for txn
2015-08-14 16:05:16 -07:00
5cd109949a etcdctl: support txn 2015-08-14 15:58:38 -07:00
9233fff48f etcdserver: support txn 2015-08-14 11:45:31 -07:00
46865fa5a5 etcdserverpb: update proto 2015-08-14 11:45:07 -07:00
d448593bbc Merge pull request #3295 from yichengq/err-example
client: fix clusterError typo in README
2015-08-14 09:35:31 -07:00
5eed141d54 client: fix clusterError typo in README
It helps users to use client better.
2015-08-13 16:38:41 -07:00
fefb273389 *: bump to v2.2.0-alpha.1+git 2015-08-13 16:01:31 -07:00
201bb4b3d8 *: bump to v2.2.0-alpha.1 2015-08-13 16:01:09 -07:00
3cc4957d98 Merge pull request #3293 from yichengq/improve-err
etcdserver: improve error message when timeout due to leader fail
2015-08-13 15:58:48 -07:00
c229e6e655 etcdserver: improve error message when timeout due to leader fail 2015-08-13 15:46:21 -07:00
394894e03e Merge pull request #3291 from yichengq/auth-cap
etcdhttp: add auth capability in 2.2
2015-08-13 15:01:59 -07:00
ceb27b1c48 etcdhttp: add auth capability in 2.2 2015-08-13 14:49:10 -07:00
a17288558e Merge pull request #3289 from yichengq/marshal
etcdserver: go back to marshal request in 2.1 way
2015-08-13 14:20:24 -07:00
334bdd1c26 Merge pull request #3153 from gtank/tls-setup
hack: TLS setup using cfssl
2015-08-13 13:53:14 -07:00
959feb70d1 Merge pull request #3275 from xiang90/sort
improve in order key generation
2015-08-13 13:51:19 -07:00
a7b9bff939 store: add 0 as padding for better lexicographic sorting. 2015-08-13 13:42:37 -07:00
0fdb77aea2 etcdserver: go back to marshal request in 2.1 way
It fixes the problem that 2.1 cannot roll upgrade to 2.2 smoothly
because 2.1 cannot understand the bytes marshalled at 2.2.
2015-08-13 13:41:52 -07:00
003d096138 Merge pull request #3286 from yichengq/fit-2.2
*: update MinClusterVersion and supportedStream map
2015-08-13 13:31:37 -07:00
c9cca6a93b *: update MinClusterVersion and supportedStream map 2015-08-13 13:05:14 -07:00
846b1fdbcd Merge pull request #3287 from xiang90/update_roadmap
Update roadmap
2015-08-13 13:00:01 -07:00
329647ab62 roadmap: update roadmap 2015-08-13 12:56:23 -07:00
6a64051245 roadmap: remove 2.1 milestone 2015-08-13 12:51:58 -07:00
80005af5b2 Merge pull request #3285 from yichengq/bump-capnslog
godeps: bump capnslog to 42a8c3b1a6f917bb8346ef738f32712a7ca0ede7
2015-08-13 11:49:38 -07:00
d66ede7186 godeps: bump capnslog to 42a8c3b1a6f917bb8346ef738f32712a7ca0ede7 2015-08-13 11:32:45 -07:00
a46943548a *: bump to v2.2.0-alpha.0+git 2015-08-13 10:21:36 -07:00
ab5a69cb18 *: bump to v2.2.0-alpha.0 2015-08-13 10:20:05 -07:00
976ce93539 Merge pull request #3277 from yichengq/better-log
etcdserver: specify timeout caused by leader election
2015-08-12 17:02:27 -07:00
27170e67b9 etcdserver: specify timeout caused by leader election
Before this PR, the timeout caused by leader election returns:

```
14:45:37 etcd2 | 2015-08-12 14:45:37.786349 E | etcdhttp: got unexpected
response error (etcdserver: request timed out)
```

After this PR:

```
15:52:54 etcd1 | 2015-08-12 15:52:54.389523 E | etcdhttp: etcdserver:
request timed out, possibly due to leader down
```
2015-08-12 16:53:18 -07:00
ddfe343e77 Merge pull request #3271 from yichengq/doc-discovery
docs: add discovery protocol doc
2015-08-12 13:51:32 -07:00
a45f0ede56 docs: add discovery protocol doc
This document talks about the technical details of discovery service
protocol. It helps users to learn about how discovery service works and
what behavior to expect.
2015-08-12 13:15:21 -07:00
7bd9d9aede Merge pull request #3273 from polvi/kube-hack
add etcd on k8s example
2015-08-12 22:13:15 +03:00
cfb3522b63 add etcd on k8s example 2015-08-12 22:12:00 +03:00
f468d8b51a Merge pull request #3270 from xiang90/better_err
Better error message for etcdctl
2015-08-12 10:27:42 -07:00
7e04a79fb4 etcdctl: print out better error information 2015-08-12 10:09:56 -07:00
5d06d4ec44 client: print url as string 2015-08-12 10:09:40 -07:00
e894756144 Merge pull request #3190 from yichengq/adjust-prop-timeout
etcdserver: adjust proposal timeout based on config
2015-08-12 09:41:25 -07:00
c3d4d11402 etcdhttp: adjust request timeout based on config
It uses heartbeat interval and election timeout to estimate the
expected request timeout.

This PR helps etcd survive under high roundtrip-time environment,
e.g., globally-deployed cluster.
2015-08-12 09:22:59 -07:00
18ecc297bc Merge pull request #3254 from es-chow/log-group
set groupID in multinode as log context so it can be logged
2015-08-12 08:05:50 -07:00
cc362ccdad raft: set logger to raft so log context such as multinode groupID can be logged 2015-08-12 22:56:00 +08:00
5a91937367 etcdserver: adjust commit timeout based on config
It uses heartbeat interval and election timeout to estimate the
commit timeout for internal requests.

This PR helps etcd survive under high roundtrip-time environment,
e.g., globally-deployed cluster.
2015-08-11 21:09:03 -07:00
042afcf2a3 Merge pull request #3266 from yichengq/client-readme
client: clean up README
2015-08-11 16:21:13 -07:00
7d618c46ad client: clean up README
Address rob's comments about sentences in README.
2015-08-11 15:33:56 -07:00
18a1c95f22 Merge pull request #3263 from xiang90/ctl_tr
etcdctl: add per request timeout
2015-08-11 14:17:12 -07:00
dceacacd49 Merge pull request #3194 from yichengq/client-readme
client: add README
2015-08-11 13:35:54 -07:00
e36c499d0f etcdctl: add per request timeout 2015-08-11 13:33:50 -07:00
8a7cf56e13 client: add README
It describes some basic usage and caveat of etcd/client package.

Write it together with Xiang.
2015-08-11 12:07:24 -07:00
83efc08137 Merge pull request #3262 from yichengq/client-deadline
client: return context.DeadlineExceeded instead of ClusterError
2015-08-11 10:42:29 -07:00
a1ef699aeb client: return context.DeadlineExceeded instead of ClusterError
This is done to match user expectation to see context.DeadlineExceeded
when it reaches deadline.
2015-08-11 10:18:38 -07:00
1fe52e1ec3 Merge pull request #3245 from yichengq/client_timeout
client: set timeout for each request
2015-08-11 10:10:42 -07:00
f4c29a5f55 client: support to set timeout for each request
Add HeaderTimeout field in Config, so users could set timeout for each request.
Before this, one hanged request may block the call for long time. After
this, if the network is good, the user could set short timeout and expect
that API call can attempt next available endpoint quickly.
2015-08-11 10:01:05 -07:00
a718329ad3 Merge pull request #3248 from xiang90/v3
initial v3 demo
2015-08-10 13:59:03 -07:00
fb5e1ac548 Merge pull request #3256 from xiang90/update_log
update logger
2015-08-10 13:54:28 -07:00
6c58333969 etcdmain: use default formatter
The default formatter would use syslog style when running
under init system, and would use pretty format otherwise.
2015-08-10 13:38:22 -07:00
48e36bbb84 Godep: update capnslog dependency 2015-08-10 13:38:00 -07:00
b0ea4ab3b1 doc: link to v3 api doc 2015-08-10 11:22:55 -07:00
c32919e6d1 *: rename v3etcdctl to etcdctlv3 2015-08-10 11:21:37 -07:00
c1e0b19f9f *: better flag 2015-08-10 09:53:17 -07:00
48b1cd54f3 Merge pull request #3243 from xiang90/conf
doc: add runtime reconfiguration design doc
2015-08-09 10:56:51 -07:00
89bf5824c2 Merge pull request #3159 from sofuture/master
use /usr/bin/env to find bash
2015-08-09 10:56:12 -07:00
601801ced5 doc: add runtime reconfiguration design doc 2015-08-09 10:55:34 -07:00
45f3a0c547 Merge pull request #3249 from philips/get-etcd-running-under-arm64
Get etcd running under arm64
2015-08-08 20:32:33 -07:00
1239e1ce6f test, scripts: use /usr/bin/env to find bash
use /usr/bin/env to find bash

add set -e back into scripts it was removed from
2015-08-08 20:52:53 -06:00
1b894c6b0b test: race detector doesn't work on armv7l
Test fails without this fix on armv7l:

    go test: -race is only supported on linux/amd64, freebsd/amd64, darwin/amd64 and windows/amd64
2015-08-08 18:11:41 -07:00
fb1951204c etcdserver: move atomics to make etcd work on arm64
Follow the simple rule in the atomic package:

"On both ARM and x86-32, it is the caller's responsibility to arrange
for 64-bit alignment of 64-bit words accessed atomically. The first word
in a global variable or in an allocated struct or slice can be relied
upon to be 64-bit aligned."

Tested on a system with /proc/cpuinfo reporting:

processor       : 0
model name      : ARMv7 Processor rev 1 (v7l)
Features        : swp half thumb fastmult vfp edsp thumbee neon vfpv3
tls vfpv4 idiva idivt vfpd32 lpae evtstrm
CPU implementer : 0x41
CPU architecture: 7
CPU variant     : 0x0
CPU part        : 0xc0d
CPU revision    : 1
2015-08-08 18:11:41 -07:00
9ff7075ce8 etcdserver: use v3server interface 2015-08-08 10:39:04 -07:00
523567bcc7 v3etcdctl: initial v3 ctl support 2015-08-08 05:58:58 -07:00
f004b4dac7 *: etcdserver supports v3 demo 2015-08-08 05:58:29 -07:00
82afadbcc6 etcdserverpb: update proto 2015-08-08 05:31:35 -07:00
668a8a8367 Merge pull request #3242 from xiang90/typo
*: fix typos vaild->valid
2015-08-07 10:58:39 -07:00
845c51fedd *: fix typos vaild->valid 2015-08-07 10:57:11 -07:00
f0a5874473 Merge pull request #3241 from yichengq/sync-pin
client: Sync() pin the endpoint when member list doesn't change
2015-08-07 10:24:29 -07:00
0ab16db728 client: Sync() pin the endpoint when member list doesn't change
This helps client to pin the same endpoint as long as cluster doesn't change.
2015-08-07 10:08:28 -07:00
d7adcc3e65 Merge pull request #3239 from xiang90/improve_probing
rafthttp: use customized transport for probing
2015-08-07 09:37:32 -07:00
b6580a9591 rafthttp: use customized transport for probing
We need to support TLS verification when probing.
2015-08-06 16:20:44 -07:00
d2363afd52 Merge pull request #3240 from xiang90/fix_log
etcdmain: fix path printing
2015-08-06 15:56:14 -07:00
f03f048232 Merge pull request #3184 from yichengq/fast-bootstrap
etcdserver: tick ElectionTicks before starting when bootstrap new cluster
2015-08-06 15:54:40 -07:00
1b572ae2dd etcdmain: fix path printing 2015-08-06 15:53:24 -07:00
21f5b885f2 etcdserver: fast election timeout when bootstrap cluster
The behavior accelarates the happen of the first-time leader election,
so the cluster could elect its leader fast. Technically, it could
help to reduce `electionMs - heartbeatMs` wait time for the first leader election.

Main usage:
1. Quick start for the local cluster when setting a little longer
election timeout
2. Quick start for the global cluster, which sets election timeout to
its maximum 50s.
2015-08-06 15:44:26 -07:00
a637e86372 Merge pull request #3220 from yichengq/fix-auth-check
etcdhttp: fix access check for multiple roles in auth
2015-08-06 15:09:04 -07:00
b9c6b64d61 Merge pull request #3216 from yichengq/cancel-err
client: return context canceled error correctly
2015-08-06 15:04:49 -07:00
b965c4b415 Merge pull request #3217 from yichengq/update-migrate-example
update commands used in admin_guide.md
2015-08-06 15:00:04 -07:00
78af793338 client: return context canceled error correctly
If the body is closed to stop watching, it will ignore the error from
reading body and return context error.

Before this PR, the cancel when watching always returns error `read tcp
127.0.0.1:57824: use of closed network connection`. After this PR, it
will return expected context canceled error.
2015-08-06 14:52:04 -07:00
b04bb3e0ea Merge pull request #3229 from xiang90/f_cerr
client: return context.Canceled error when user cancels the request
2015-08-06 14:41:19 -07:00
25ad71fbac Merge pull request #3225 from yichengq/client-record-err
client: return correct error for 50x response
2015-08-06 14:40:38 -07:00
7314310aed Merge pull request #3233 from xiang90/srv_discovery
better dns discovery error and doc
2015-08-06 14:35:22 -07:00
cfeaf3d172 client: return correct error for 50x response
etcd always returns 500/503 response when it may have no leader.
So we should log the other 50x response in a normal way.

This helps to log correctly when discovery meets 504 error. Before this
PR, it logs like this:

```
18:31:58 etcd2 | 2015/08/4 18:31:58 discovery: error #0: client: etcd
member https://discovery.etcd.io has no leader
18:31:58 etcd2 | 2015/08/4 18:31:58 discovery: waiting for other nodes:
error connecting to https://discovery.etcd.io, retrying in 4s
```

After this PR:

```
22:20:25 etcd2 | 2015/08/4 22:20:25 discovery: error #0: client: etcd
member https://discovery.etcd.io returns server error [Gateway Timeout]
22:20:25 etcd2 | 2015/08/4 22:20:25 discovery: waiting for other nodes:
error connecting to https://discovery.etcd.io, retrying in 4s
```
2015-08-06 14:25:03 -07:00
e9f05e8959 doc: explain srv error 2015-08-06 14:24:58 -07:00
2c2249dadc Merge pull request #3219 from yichengq/limit-listener
etcdmain: stop accepting client conns when it reachs limit
2015-08-06 12:17:49 -07:00
97923ca3fc etcdmain: close client conns when it exceeds limit
This solves the problem that etcd may fatal because its critical path
cannot get file descriptor resource when the number of clients is too
big. The PR lets the client listener close client connections
immediately after they are accepted when
the file descriptor usage in the process reaches some pre-set limit, so
it ensures that the internal critical path could always get file
descriptor when it needs.

When there are tons to clients connecting to the server, the original
behavior is like this:

```
2015/08/4 16:42:08 etcdserver: cannot monitor file descriptor usage
(open /proc/self/fd: too many open files)
2015/08/4 16:42:33 etcdserver: failed to purge snap file open
default2.etcd/member/snap: too many open files
[halted]
```

Current behavior is like this:

```
2015/08/6 19:05:25 transport: accept error: closing connection,
exceed file descriptor usage limitation (fd limit=874)
2015/08/6 19:05:25 transport: accept error: closing connection,
exceed file descriptor usage limitation (fd limit=874)
2015/08/6 19:05:26 transport: accept error: closing connection,
exceed file descriptor usage limitation (fd limit=874)
2015/08/6 19:05:27 transport: accept error: closing connection,
exceed file descriptor usage limitation (fd limit=874)
2015/08/6 19:05:28 transport: accept error: closing connection,
exceed file descriptor usage limitation (fd limit=874)
2015/08/6 19:05:28 etcdserver: 80% of the file descriptor limit is
used [used = 873, limit = 1024]
```

It is available at linux system today because pkg/runtime only has linux
support.
2015-08-06 12:03:20 -07:00
203e0f178b etcdmian: better error for srv discovery failure 2015-08-06 11:38:53 -07:00
01c286ccb6 Merge pull request #3231 from xiang90/fallocate
pkg/fileutil: support perallocate
2015-08-06 10:25:28 -07:00
39a4b6a5e5 pkg/fileutil: support perallocate 2015-08-06 10:10:58 -07:00
9a8607fce1 Merge pull request #3187 from yichengq/client-keep-sync
client: add KeepSync function
2015-08-06 00:16:28 -07:00
c53b3016ae client: add AutoSync function
AutoSync provides the way for client to syncing member list from
etcd cluster automatically.
2015-08-05 13:22:56 -07:00
807a6f209e docs/admin_guide: decouple example from CoreOS specific details
This makes the example commands general, while keeping it easy to
understand. It also fixes some name mismatch.
2015-08-05 11:33:46 -07:00
f38187bbdb client: return context.Canceled error when user cancels the request 2015-08-05 09:52:30 -07:00
ff0b8723c7 Merge pull request #2688 from xiang90/versioning
etcdserver: internal request union
2015-08-05 09:27:32 -07:00
58503817ec etcdserver: internal request union 2015-08-05 07:47:10 -07:00
487639b2d8 Merge pull request #3222 from mitake/wal-log-error
wal: log errors in wal.Close()
2015-08-04 23:19:45 -07:00
9cbeffc720 Merge pull request #3224 from xiang90/fix_ls
etcdctl: ls takes / as default key arg
2015-08-04 23:15:29 -07:00
ba76e27875 wal: log errors in wal.Close()
This patch adds error logging in wal.Close() if unlocking and
destroying fail. Though it is hard to handling the errors, logging
would be helpful for trouble shooting.
2015-08-05 15:03:45 +09:00
9527a97720 etcdctl: ls takes / as default key arg 2015-08-04 22:56:55 -07:00
718a42f408 Merge pull request #3210 from xiang90/probing
monitoring connectivity between peers
2015-08-04 16:56:31 -07:00
18169e896c etcdhttp: fix access check for multiple roles in auth
Check access for multiple roles should go through all roles.
2015-08-04 14:31:07 -07:00
0650170a1b Merge pull request #3196 from eyakubovich/fix-watch-timeout
client: handle watch timing out elegantly
2015-08-04 13:52:42 -07:00
1e048b5c24 rafthttp: cleanup prober when stopping the transport 2015-08-04 17:42:51 +08:00
709718ed97 godeps: update probing pkg 2015-08-04 17:40:39 +08:00
0fc764200d rafthttp: monitor connection 2015-08-04 17:39:40 +08:00
ff5c3469c1 Merge pull request #3197 from xiang90/health
etcdctl: cluster-health supports forever flag
2015-08-03 20:48:06 -07:00
6312e22b1d client: handle empty watch responses elegantly
Even though current etcd does not time out
watches, the client could be running against
an old etcd version or the server may close
polling connection for other reasons.
This patch ignores successful (as in 200)
responses with emtpy bodies instead
of producing JSON errors.
2015-08-03 11:47:21 -07:00
306085db5f Godeps: add probing dependency 2015-08-03 09:07:43 +08:00
f7f00b0af6 etcdctl: cluster-health supports forever flag
cluster-health command supports checking the cluster health
forever.
2015-08-01 22:29:08 +08:00
3da1df2648 Merge pull request #3207 from xiang90/rm_migration
*: remove migration related stuff from 2.2
2015-08-01 19:47:17 +08:00
2b8abeb093 *: remove migration related stuff from 2.2 2015-08-01 19:37:20 +08:00
eee1c8b8ee Merge pull request #3200 from xiang90/d_doc
doc: unique names must be specified when using public discovery service
2015-08-01 07:34:25 +08:00
8bd9554338 Merge pull request #3202 from yichengq/fix-etcdctl-watch
etcdctl: fix watch -after-index parsing
2015-07-31 14:41:45 -07:00
4a89b3f8f3 Merge pull request #3116 from offscale/master
build: implemented build shell-script for Windows
2015-07-31 11:55:42 -07:00
05b2d06788 Merge pull request #3199 from xiang90/sdnotify
etcdmain: support sdnotify for readiness
2015-07-31 19:04:35 +08:00
4a0d8ee4bd build: implemented build shell-script for Windows 2015-07-31 17:43:47 +10:00
0cbac56fa2 etcdmain: support sdnotify for readiness 2015-07-31 13:33:18 +08:00
beeecc32b0 doc: unique names must be specified when using public discovery service 2015-07-31 09:12:44 +08:00
c1c5c7c99c Merge pull request #3091 from barakmich/client_auth_cov
etcdhttp: Improve test coverage surrounding auth
2015-07-30 17:00:49 -04:00
dd1a8fe330 etcdhttp: Improve test coverage surrounding auth 2015-07-30 14:21:08 -04:00
147885078c etcdctl: fix watch -after-index parsing
It uses -after-index incorrectly now:

```
$ ./bin/etcdctl --debug watch -after-index 31 foo
Cluster-Endpoints: http://localhost:2379, http://localhost:4001
cURL Command: curl -X GET
http://localhost:2379/v2/keys/foo?recursive=false&wait=true&waitIndex=33
```

After this PR:

```
$ ./bin/etcdctl --debug watch -after-index 31 foo
Cluster-Endpoints: http://localhost:2379, http://localhost:4001
cURL Command: curl -X GET
http://localhost:2379/v2/keys/foo?recursive=false&wait=true&waitIndex=32
```
2015-07-30 11:15:43 -07:00
219ed1695b Merge pull request #3178 from yichengq/refactor-cluster-health
etcdctl: refactor the way to check cluster health
2015-07-29 18:16:26 -07:00
80b794dccc Merge pull request #3185 from xiang90/add_debug_endpoint
etcdhttp: add config/local/debug endpoint
2015-07-30 08:46:07 +08:00
4e31df2c2b etcdhttp: add config/local/log endpoint
PUT on the endpoint sets the GlobalDebugLevel to json level value.
The action overwrites the origianl log level setting from
users. We need to write doc to warn this.
2015-07-30 08:35:01 +08:00
e62a3b8a62 Merge pull request #2891 from glensc/patch-1
build: use posix shell
2015-07-29 17:15:57 -07:00
ff945c7404 Merge pull request #3181 from xiang90/2.2-client-error
client: return cluster error if the etcd cluster is not avaliable
2015-07-30 08:08:09 +08:00
f1aaa7a9e3 etcdctl: refactor the way to check cluster health
This method uses raft status exposed at /debug/varz to determine the
health of the cluster. It uses whether commit index increases to
determine the cluster health, and uses whether match index increases to
determine the member health.

This could fix the bug #2711 that fails to detect follower is unhealthy
because it doesn't rely on whether message in long-polling connection is sent.

This health check is stricter than the old one, and reflects the
situation that whether followers are healthy in the view of the leader. One
example is that if the follower is receiving the snapshot, it will turns
out to be unhealthy because it doesn't move forward.

`etcdctl cluster-health` will reflect the healthy view in the raft level,
while connectivity checks reflects the healthy view in transport level.
2015-07-29 17:06:55 -07:00
a47e661fff discovery: print out detailed cluster error 2015-07-29 23:06:57 +08:00
5fa8652241 client: return cluster error if the etcd cluster is not avaliable
Add a new ClusterError type. It contians all encountered errors and
return ClusterNotAvailable as the error string.
2015-07-29 22:55:15 +08:00
6b8b507312 Merge pull request #3176 from yichengq/reject-high-election
etcdmain: reject unreasonably high values of -election-timeout
2015-07-28 10:33:58 -07:00
ec214030d0 etcdmain: reject unreasonably high values of -election-timeout
This helps users to detect setting problem early.
2015-07-28 10:07:57 -07:00
edfec45bf5 hack: TLS setup using cfssl
this demonstrates basic TLS setup with cfssl. it's much easier than other
available tools.
2015-07-27 14:51:17 -07:00
7831a30e46 Merge pull request #3180 from shafreeck/master
Update libraries-and-tools.md
2015-07-27 14:45:31 -07:00
6184e271a4 Merge pull request #3164 from yichengq/pin-endpoint
client: pin itself to an endpoint that given
2015-07-27 14:35:51 -07:00
6fc9dbfe56 Merge pull request #3114 from yichengq/clean-raft-init
etcdserver: clean up start and stop logic of raft
2015-07-27 14:19:25 -07:00
ea2347a40f client: pin itself to an endpoint that given
1. When reset endpoints, client will choose a random endpoint to pin.
2. If the pinned endpoint is healthy, client will keep using it.
3. If the pinned endpoint becomes unhealthy, client will attempt other
endpoints and update its pin.
2015-07-27 13:36:53 -07:00
7696dd3280 etcdserver: clean up start and stop logic of raft
kill TODO and make it more readable.
2015-07-27 13:24:26 -07:00
5e3dc31e6f Merge pull request #3150 from gouyang/master
pkg/mflag: add modified flag package
2015-07-24 15:26:07 -07:00
a7eef376b7 Merge pull request #3183 from xiang90/txn
*: tnx -> txn
2015-07-25 01:48:06 +08:00
53a77fa519 *: tnx -> txn 2015-07-24 23:21:09 +08:00
c9769ee966 etcdmain: Don't print flags when flag parse error
At present it prints the whole usage and flags, which cause the exact
error message is hidden two screens above.

Fixes #3141

Signed-off-by: Guohua Ouyang <gouyang@redhat.com>
2015-07-24 21:29:21 +08:00
e75446ca27 docs: add cetcd into libraries-and-tools.md 2015-07-24 12:08:39 +00:00
b407f72766 Merge pull request #3166 from yichengq/publish-timeout
etcdserver: rename defaultPublishRetryInterval -> defaultPublishTimeout
2015-07-23 10:30:41 -07:00
b7892b20c1 etcdserver: rename defaultPublishRetryInterval -> defaultPublishTimeout
This makes code more readable and reasonable.
2015-07-23 10:09:28 -07:00
58bc617dd0 Merge pull request #3175 from xiang90/2.2-ctl-bug
etcdctl: fix exec watch command
2015-07-23 14:37:38 +08:00
448ca20cdc etcdctl: fix exec watch command
The previous flag parsing has a small issue. It uses
`recursive == true` and `after-index == 0` to determine
if user specifies the sub flags. This is incorrect since
user can specify `after-index = 0`. Then the flag parsing
would be confused.

This commit explicitly find the `--` in the remaining args
and determine the key and cmdArgs accordingly.
2015-07-23 13:13:15 +08:00
43f4b99d52 Merge pull request #3174 from xiang90/2.2_submit_bug
doc: add reporting bug doc
2015-07-23 13:08:35 +08:00
1b5e41e3f4 doc: add reporting bug doc 2015-07-23 12:55:38 +08:00
93002caca5 Merge pull request #3165 from yichengq/client-quorum
client: add Quorum option in getOption
2015-07-22 16:54:14 -07:00
b20b87893f client: add Quorum option in getOption 2015-07-22 15:19:34 -07:00
6be02ff5ec etcdmian: fix initialization confilct
Fix #3142

Ignore flags if etcd is already initialized.
2015-07-21 12:53:21 -07:00
24db661401 etcdmain: warn when listening on HTTP if TLS is set
If the user sets TLS info, this implies that he wants to listen on TLS.
If etcd finds that urls to listen is still HTTP schema, it prints out
warning to notify user about possible wrong setting.
2015-07-21 12:53:21 -07:00
604709cad7 etcdctl: update -peers to default to use schema
Change its default value from `127.0.0.1:4001,127.0.0.1:2379` to
`http://127.0.0.1:4001,http://127.0.0.1:2379`

Adding HTTP schema makes its format consistent with etcd's xxx-urls
flags.
2015-07-21 12:53:21 -07:00
d9c27138fa discovery: return bad discovery endpoint error 2015-07-21 12:53:21 -07:00
d2dac0fe59 client: consume json error and return ErrInvaildJSON
The default JSON error is not very readable. We let client
consume the error and return a more understandable error in
the context of etcd.

Fix #3120
2015-07-21 12:53:21 -07:00
6317abf7e4 pkg/transport: fix HTTPS downgrade bug for keepalive listener
If TLS config is empty, etcd downgrades keepalive listener from HTTPS to
HTTP without warning. This results in HTTPS downgrade bug for client urls.
The commit returns error if it cannot listen on TLS.
2015-07-21 12:53:21 -07:00
43437e21f9 etcdctl: added domain discovery flag
provided a domain, will look up SRV records for etcd endpoints

Fixes #2636
2015-07-21 12:53:21 -07:00
dc3f7f5d90 *: detect duplicate name for discovery bootstrap 2015-07-21 12:53:20 -07:00
b8279b3591 types: add len func for urlmaps 2015-07-21 12:53:20 -07:00
ee82ee05b4 etcdctl: support member update command 2015-07-21 12:53:20 -07:00
6e3769d39e client: add member update 2015-07-21 12:53:20 -07:00
9f9661f513 etcdctl: print out key and action when watching recursively 2015-07-21 12:53:20 -07:00
87ef0f0b3e godep: remove go-etcd dependency 2015-07-21 12:53:20 -07:00
071ad9f72b etcdctl: health use etcd/client 2015-07-21 12:53:20 -07:00
0b1ddce889 etcdctl: import snap use etcd/client 2015-07-21 12:53:20 -07:00
adeb101e04 etcdctl: remove old stuff 2015-07-21 12:53:20 -07:00
759c156e3e etcdctl: exec_watch use etcd/client 2015-07-21 12:53:20 -07:00
5b01b3877f etcdctl: watch use etcd/client 2015-07-21 12:53:20 -07:00
b20c06348d etcdctl: ls use etcd/client 2015-07-21 12:53:19 -07:00
ae1669de26 etcdctl: updatedir use etcd/client 2015-07-21 12:53:19 -07:00
f12ae45c6a etcdctl: update use etcd/client 2015-07-21 12:53:19 -07:00
58b19a7c1e etcdctl: rmdir use etcd/client 2015-07-21 12:53:19 -07:00
9d7a8dd2b0 etcdctl: mk use etcd/client 2015-07-21 12:53:19 -07:00
61befc7ce6 etcdctl: minor cleanup 2015-07-21 12:53:19 -07:00
e3fcc450cf etcdctl: make rm use etcd/client 2015-07-21 12:53:19 -07:00
9d9c3a7180 etcdctl: make setdir/mkdir use etcd/client 2015-07-21 12:53:19 -07:00
db4b18aee3 etcdctl: make set command use etcd/client 2015-07-21 12:53:19 -07:00
e9478ba630 etcdctl: make get command use etcd/client 2015-07-21 12:53:19 -07:00
147b14cfc0 *: bump to v2.1.1+git 2015-07-21 10:43:49 -07:00
6335fdc595 *: bump to v2.1.1 2015-07-21 10:41:26 -07:00
09b9c30beb pkg/transport: include debug output for trusted-ca
since --peer-ca-file is deprecated we need to update the debug output

before:

```
$ etcd ... --peer-cert-file infra1.crt -peer-key-file
 infra1.key.insecure -peer-trusted-ca-file ca.crt --client-cert-auth
etcdmain: peerTLS: cert = infra1.crt, key = infra1.key.insecure, ca =
```

after:

```
$ etcd ... --peer-cert-file infra1.crt -peer-key-file
 infra1.key.insecure -peer-trusted-ca-file ca.crt --client-cert-auth
etcdmain: peerTLS: cert = infra1.crt, key = infra1.key.insecure, ca = , trusted-ca = ca.crt
```
2015-07-04 14:28:18 -07:00
77c3613d94 build: use posix shell 2015-05-30 09:34:54 +03:00
1452 changed files with 234126 additions and 58635 deletions

1
.gitignore vendored
View File

@ -9,3 +9,4 @@
*.swp
/hack/insta-discovery/.env
*.test
tools/functional-tester/docker/bin

View File

@ -1,4 +1,4 @@
// Copyright 2014 CoreOS, Inc.
// Copyright 2016 CoreOS, Inc.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.

View File

@ -1,10 +1,25 @@
language: go
sudo: false
go:
- 1.4
- 1.5
- 1.6
- tip
install:
- go get github.com/barakmich/go-nyet
matrix:
allow_failures:
- go: tip
addons:
apt:
packages:
- libpcap-dev
- libaspell-dev
- libhunspell-dev
before_install:
- go get -v github.com/chzchzchz/goword
script:
- INTEGRATION=y ./test
- ./test

View File

@ -12,6 +12,14 @@ etcd is Apache 2.0 licensed and accepts contributions via GitHub pull requests.
- Fork the repository on GitHub
- Read the README.md for build instructions
## Reporting Bugs and Creating Issues
Reporting bugs is one of the best ways to contribute. However, a good bug report
has some very specific qualities, so please read over our short document on
[reporting bugs](https://github.com/coreos/etcd/blob/master/Documentation/reporting_bugs.md)
before you submit your bug report. This document might contain links known
issues, another good reason to take a look there, before reporting your bug.
## Contribution flow
This is a rough outline of what a contributor's workflow looks like:
@ -27,7 +35,7 @@ Thanks for your contributions!
### Code style
The coding style suggested by the Golang community is used in etcd. See the [style doc](https://code.google.com/p/go-wiki/wiki/CodeReviewComments) for details.
The coding style suggested by the Golang community is used in etcd. See the [style doc](https://github.com/golang/go/wiki/CodeReviewComments) for details.
Please follow this style to make etcd easy to review, maintain and develop.

View File

@ -0,0 +1,31 @@
# Snapshot Migration
You can migrate a snapshot of your data from a v0.4.9+ cluster into a new etcd 2.2 cluster using a snapshot migration. After snapshot migration, the etcd indexes of your data will change. Many etcd applications rely on these indexes to behave correctly. This operation should only be done while all etcd applications are stopped.
To get started get the newest data snapshot from the 0.4.9+ cluster:
```
curl http://cluster.example.com:4001/v2/migration/snapshot > backup.snap
```
Now, import the snapshot into your new cluster:
```
etcdctl --endpoint new_cluster.example.com import --snap backup.snap
```
If you have a large amount of data, you can specify more concurrent works to copy data in parallel by using `-c` flag.
If you have hidden keys to copy, you can use `--hidden` flag to specify. For example fleet uses `/_coreos.com/fleet` so to import those keys use `--hidden /_coreos.com`.
And the data will quickly copy into the new cluster:
```
entering dir: /
entering dir: /foo
entering dir: /foo/bar
copying key: /foo/bar/1 1
entering dir: /
entering dir: /foo2
entering dir: /foo2/bar2
copying key: /foo2/bar2/2 2
```

View File

@ -1,40 +1,108 @@
## Administration
# Administration
### Data Directory
## Data Directory
#### Lifecycle
### Lifecycle
When first started, etcd stores its configuration into a data directory specified by the data-dir configuration parameter.
Configuration is stored in the write ahead log and includes: the local member ID, cluster ID, and initial cluster configuration.
The write ahead log and snapshot files are used during member operation and to recover after a restart.
If a members data directory is ever lost or corrupted then the user should remove the etcd member from the cluster via the [members API][members-api].
Having a dedicated disk to store wal files can improve the throughput and stabilize the cluster.
It is highly recommended to dedicate a wal disk and set `--wal-dir` to point to a directory on that device for a production cluster deployment.
If a members data directory is ever lost or corrupted then the user should [remove][remove-a-member] the etcd member from the cluster using `etcdctl` tool.
A user should avoid restarting an etcd member with a data directory from an out-of-date backup.
Using an out-of-date data directory can lead to inconsistency as the member had agreed to store information via raft then re-joins saying it needs that information again.
For maximum safety, if an etcd member suffers any sort of data corruption or loss, it must be removed from the cluster.
Once removed the member can be re-added with an empty data directory.
[members-api]: other_apis.md#members-api
#### Contents
### Contents
The data directory has two sub-directories in it:
1. wal: write ahead log files are stored here. For details see the [wal package documentation][wal-pkg]
2. snap: log snapshots are stored here. For details see the [snap package documentation][snap-pkg]
[wal-pkg]: http://godoc.org/github.com/coreos/etcd/wal
[snap-pkg]: http://godoc.org/github.com/coreos/etcd/snap
If `--wal-dir` flag is set, etcd will write the write ahead log files to the specified directory instead of data directory.
### Cluster Management
## Cluster Management
#### Lifecycle
### Lifecycle
If you are spinning up multiple clusters for testing it is recommended that you specify a unique initial-cluster-token for the different clusters.
This can protect you from cluster corruption in case of mis-configuration because two members started with different cluster tokens will refuse members from each other.
#### Optimal Cluster Size
### Monitoring
It is important to monitor your production etcd cluster for healthy information and runtime metrics.
#### Health Monitoring
At lowest level, etcd exposes health information via HTTP at `/health` in JSON format. If it returns `{"health": "true"}`, then the cluster is healthy. Please note the `/health` endpoint is still an experimental one as in etcd 2.2.
```
$ curl -L http://127.0.0.1:2379/health
{"health": "true"}
```
You can also use etcdctl to check the cluster-wide health information. It will contact all the members of the cluster and collect the health information for you.
```
$./etcdctl cluster-health
member 8211f1d0f64f3269 is healthy: got healthy result from http://127.0.0.1:12379
member 91bc3c398fb3c146 is healthy: got healthy result from http://127.0.0.1:22379
member fd422379fda50e48 is healthy: got healthy result from http://127.0.0.1:32379
cluster is healthy
```
#### Runtime Metrics
etcd uses [Prometheus][prometheus] for metrics reporting in the server. You can read more through the runtime metrics [doc][metrics].
### Debugging
Debugging a distributed system can be difficult. etcd provides several ways to make debug
easier.
#### Enabling Debug Logging
When you want to debug etcd without stopping it, you can enable debug logging at runtime.
etcd exposes logging configuration at `/config/local/log`.
```
$ curl http://127.0.0.1:2379/config/local/log -XPUT -d '{"Level":"DEBUG"}'
$ # debug logging enabled
$
$ curl http://127.0.0.1:2379/config/local/log -XPUT -d '{"Level":"INFO"}'
$ # debug logging disabled
```
#### Debugging Variables
Debug variables are exposed for real-time debugging purposes. Developers who are familiar with etcd can utilize these variables to debug unexpected behavior. etcd exposes debug variables via HTTP at `/debug/vars` in JSON format. The debug variables contains
`cmdline`, `file_descriptor_limit`, `memstats` and `raft.status`.
`cmdline` is the command line arguments passed into etcd.
`file_descriptor_limit` is the max number of file descriptors etcd can utilize.
`memstats` is explained in detail in the [Go runtime documentation][golang-memstats].
`raft.status` is useful when you want to debug low level raft issues if you are familiar with raft internals. In most cases, you do not need to check `raft.status`.
```json
{
"cmdline": ["./etcd"],
"file_descriptor_limit": 0,
"memstats": {"Alloc":4105744,"TotalAlloc":42337320,"Sys":12560632,"...":"..."},
"raft.status": {"id":"ce2a822cea30bfca","term":5,"vote":"ce2a822cea30bfca","commit":23509,"lead":"ce2a822cea30bfca","raftState":"StateLeader","progress":{"ce2a822cea30bfca":{"match":23509,"next":23510,"state":"ProgressStateProbe"}}}
}
```
### Optimal Cluster Size
The recommended etcd cluster size is 3, 5 or 7, which is decided by the fault tolerance requirement. A 7-member cluster can provide enough fault tolerance in most cases. While larger cluster provides better fault tolerance the write performance reduces since data needs to be replicated to more machines.
@ -57,7 +125,7 @@ As you can see, adding another member to bring the size of cluster up to an odd
#### Changing Cluster Size
After your cluster is up and running, adding or removing members is done via [runtime reconfiguration](runtime-configuration.md), which allows the cluster to be modified without downtime. The `etcdctl` tool has a `member list`, `member add` and `member remove` commands to complete this process.
After your cluster is up and running, adding or removing members is done via [runtime reconfiguration][runtime-reconfig], which allows the cluster to be modified without downtime. The `etcdctl` tool has `member list`, `member add` and `member remove` commands to complete this process.
### Member Migration
@ -65,10 +133,10 @@ When there is a scheduled machine maintenance or retirement, you might want to m
The data directory contains all the data to recover a member to its point-in-time state. To migrate a member:
* Stop the member process
* Copy the data directory of the now-idle member to the new machine
* Update the peer URLs for that member to reflect the new machine according to the [member api] [change peer url]
* Start etcd on the new machine, using the same configuration and the copy of the data directory
* Stop the member process.
* Copy the data directory of the now-idle member to the new machine.
* Update the peer URLs for the replaced member to reflect the new machine according to the [runtime reconfiguration instructions][update-member].
* Start etcd on the new machine, using the same configuration and the copy of the data directory.
This example will walk you through the process of migrating the infra1 member to a new machine:
@ -78,11 +146,11 @@ This example will walk you through the process of migrating the infra1 member to
|infra1|10.0.1.11:2380|
|infra2|10.0.1.12:2380|
```
$ export ETCDCTL_PEERS=http://10.0.1.10:2379,http://10.0.1.11:2379,http://10.0.1.12:2379
```sh
$ export ETCDCTL_ENDPOINT=http://10.0.1.10:2379,http://10.0.1.11:2379,http://10.0.1.12:2379
```
```
```sh
$ etcdctl member list
84194f7c5edd8b37: name=infra0 peerURLs=http://10.0.1.10:2380 clientURLs=http://127.0.0.1:2379,http://10.0.1.10:2379
b4db3bf5e495e255: name=infra1 peerURLs=http://10.0.1.11:2380 clientURLs=http://127.0.0.1:2379,http://10.0.1.11:2379
@ -91,53 +159,57 @@ bc1083c870280d44: name=infra2 peerURLs=http://10.0.1.12:2380 clientURLs=http://1
#### Stop the member etcd process
```
$ ssh core@10.0.1.11
```sh
$ ssh 10.0.1.11
```
```
$ sudo systemctl stop etcd
```sh
$ kill `pgrep etcd`
```
#### Copy the data directory of the now-idle member to the new machine
```
$ tar -cvzf node1.etcd.tar.gz /var/lib/etcd/node1.etcd
$ tar -cvzf infra1.etcd.tar.gz %data_dir%
```
```
$ scp node1.etcd.tar.gz core@10.0.1.13:~/
```sh
$ scp infra1.etcd.tar.gz 10.0.1.13:~/
```
#### Update the peer URLs for that member to reflect the new machine
```
```sh
$ curl http://10.0.1.10:2379/v2/members/b4db3bf5e495e255 -XPUT \
-H "Content-Type: application/json" -d '{"peerURLs":["http://10.0.1.13:2380"]}'
```
Or use `etcdctl member update` command
```sh
$ etcdctl member update b4db3bf5e495e255 http://10.0.1.13:2380
```
#### Start etcd on the new machine, using the same configuration and the copy of the data directory
```sh
$ ssh 10.0.1.13
```
$ ssh core@10.0.1.13
```sh
$ tar -xzvf infra1.etcd.tar.gz -C %data_dir%
```
```
$ tar -xzvf node1.etcd.tar.gz -C /var/lib/etcd
```
```
etcd -name node1 \
etcd -name infra1 \
-listen-peer-urls http://10.0.1.13:2380 \
-listen-client-urls http://10.0.1.13:2379,http://127.0.0.1:2379 \
-advertise-client-urls http://10.0.1.13:2379,http://127.0.0.1:2379
```
[change peer url]: other_apis.md#change-the-peer-urls-of-a-member
### Disaster Recovery
etcd is designed to be resilient to machine failures. An etcd cluster can automatically recover from any number of temporary failures (for example, machine reboots), and a cluster of N members can tolerate up to _(N/2)-1_ permanent failures (where a member can no longer access the cluster, due to hardware failure or disk corruption). However, in extreme circumstances, a cluster might permanently lose enough members such that quorum is irrevocably lost. For example, if a three-node cluster suffered two simultaneous and unrecoverable machine failures, it would be normally impossible for the cluster to restore quorum and continue functioning.
etcd is designed to be resilient to machine failures. An etcd cluster can automatically recover from any number of temporary failures (for example, machine reboots), and a cluster of N members can tolerate up to _(N-1)/2_ permanent failures (where a member can no longer access the cluster, due to hardware failure or disk corruption). However, in extreme circumstances, a cluster might permanently lose enough members such that quorum is irrevocably lost. For example, if a three-node cluster suffered two simultaneous and unrecoverable machine failures, it would be normally impossible for the cluster to restore quorum and continue functioning.
To recover from such scenarios, etcd provides functionality to backup and restore the datastore and recreate the cluster without data loss.
@ -149,8 +221,8 @@ The first step of the recovery is to backup the data directory on a functioning
```sh
etcdctl backup \
--data-dir /var/lib/etcd \
--backup-dir /tmp/etcd_backup
--data-dir %data_dir% \
--backup-dir %backup_data_dir%
```
This command will rewrite some of the metadata contained in the backup (specifically, the node ID and cluster ID), which means that the node will lose its former identity. In order to recreate a cluster from the backup, you will need to start a new, single-node cluster. The metadata is rewritten to prevent the new node from inadvertently being joined onto an existing cluster.
@ -161,7 +233,7 @@ To restore a backup using the procedure created above, start etcd with the `-for
```sh
etcd \
-data-dir=/tmp/etcd_backup \
-data-dir=%backup_data_dir% \
-force-new-cluster \
...
```
@ -172,18 +244,18 @@ Once you have verified that etcd has started successfully, shut it down and move
```sh
pkill etcd
rm -fr /var/lib/etcd
mv /tmp/etcd_backup /var/lib/etcd
rm -fr %data_dir%
mv %backup_data_dir% %data_dir%
etcd \
-data-dir=/var/lib/etcd \
-data-dir=%data_dir% \
...
```
#### Restoring the cluster
Now that the node is running successfully, you should [change its advertised peer URLs](other_apis.md#change-the-peer-urls-of-a-member), as the `--force-new-cluster` has set the peer URL to the default (listening on localhost).
Now that the node is running successfully, [change its advertised peer URLs][update-member], as the `--force-new-cluster` option has set the peer URL to the default listening on localhost.
You can then add more nodes to the cluster and restore resiliency. See the [runtime configuration](runtime-configuration.md) guide for more details.
You can then add more nodes to the cluster and restore resiliency. See the [add a new member][add-a-member] guide for more details. **NB:** If you are trying to restore your cluster using old failed etcd nodes, please make sure you have stopped old etcd instances and removed their old data directories specified by the data-dir configuration parameter.
### Client Request Timeout
@ -214,6 +286,18 @@ If timeout happens several times continuously, administrators should check statu
#### Maximum OS threads
By default, etcd uses the default configuration of the Go 1.4 runtime, which means that at most one operating system thread will be used to execute code simultaneously. (Note that this default behavior [may change in Go 1.5](https://docs.google.com/document/d/1At2Ls5_fhJQ59kDK2DFVhFu3g5mATSXqqV5QrxinasI/edit)).
By default, etcd uses the default configuration of the Go 1.4 runtime, which means that at most one operating system thread will be used to execute code simultaneously. (Note that this default behavior [has changed in Go 1.5][golang1.5-runtime]).
When using etcd in heavy-load scenarios on machines with multiple cores it will usually be desirable to increase the number of threads that etcd can utilize. To do this, simply set the environment variable `GOMAXPROCS` to the desired number when starting etcd. For more information on this variable, see the Go [runtime](https://golang.org/pkg/runtime) documentation.
When using etcd in heavy-load scenarios on machines with multiple cores it will usually be desirable to increase the number of threads that etcd can utilize. To do this, simply set the environment variable GOMAXPROCS to the desired number when starting etcd. For more information on this variable, see the [Go runtime documentation][golang-runtime].
[add-a-member]: runtime-configuration.md#add-a-new-member
[golang1.5-runtime]: https://golang.org/doc/go1.5#runtime
[golang-memstats]: https://golang.org/pkg/runtime/#MemStats
[golang-runtime]: https://golang.org/pkg/runtime
[metrics]: metrics.md
[prometheus]: http://prometheus.io/
[remove-a-member]: runtime-configuration.md#remove-a-member
[runtime-reconfig]: runtime-configuration.md#cluster-reconfiguration-operations
[snap-pkg]: http://godoc.org/github.com/coreos/etcd/snap
[update-a-member]: runtime-configuration.md#update-a-member
[wal-pkg]: http://godoc.org/github.com/coreos/etcd/wal

View File

@ -78,12 +78,9 @@ X-Raft-Index: 5398
X-Raft-Term: 1
```
- `X-Etcd-Index` is the current etcd index as explained above. When request is a watch on key space, `X-Etcd-Index` is the current etcd index when the watch starts, which means that the watched event may happen after `X-Etcd-Index`.
- `X-Raft-Index` is similar to the etcd index but is for the underlying raft protocol
- `X-Raft-Term` is an integer that will increase whenever an etcd master election happens in the cluster. If this number is increasing rapidly, you may need to tune the election timeout. See the [tuning][tuning] section for details.
[tuning]: #tuning
* `X-Etcd-Index` is the current etcd index as explained above. When request is a watch on key space, `X-Etcd-Index` is the current etcd index when the watch starts, which means that the watched event may happen after `X-Etcd-Index`.
* `X-Raft-Index` is similar to the etcd index but is for the underlying raft protocol.
* `X-Raft-Term` is an integer that will increase whenever an etcd master election happens in the cluster. If this number is increasing rapidly, you may need to tune the election timeout. See the [tuning][tuning] section for details.
### Get the value of a key
@ -234,6 +231,50 @@ curl http://127.0.0.1:2379/v2/keys/foo -XPUT -d value=bar -d ttl= -d prevExist=t
}
```
### Refreshing key TTL
Keys in etcd can be refreshed without notifying watchers
this can be achieved by setting the refresh to true when updating a TTL
You cannot update the value of a key when refreshing it
```sh
curl http://127.0.0.1:2379/v2/keys/foo -XPUT -d value=bar -d ttl=5
curl http://127.0.0.1:2379/v2/keys/foo -XPUT -d ttl=5 -d refresh=true -d prevExist=true
```
```json
{
"action": "set",
"node": {
"createdIndex": 5,
"expiration": "2013-12-04T12:01:21.874888581-08:00",
"key": "/foo",
"modifiedIndex": 5,
"ttl": 5,
"value": "bar"
}
}
{
"action":"update",
"node":{
"key":"/foo",
"value":"bar",
"expiration": "2013-12-04T12:01:26.874888581-08:00",
"ttl":5,
"modifiedIndex":6,
"createdIndex":5
},
"prevNode":{
"key":"/foo",
"value":"bar",
"expiration":"2013-12-04T12:01:21.874888581-08:00",
"ttl":3,
"modifiedIndex":5,
"createdIndex":5
}
}
```
### Waiting for a change
@ -356,6 +397,13 @@ So the first watch after the get should be:
curl 'http://127.0.0.1:2379/v2/keys/foo?wait=true&waitIndex=2008'
```
#### Connection being closed prematurely
The server may close a long polling connection before emitting any events.
This can happen due to a timeout or the server being shutdown.
Since the HTTP header is sent immediately upon accepting the connection, the response will be seen as empty: `200 OK` and empty body.
The clients should be prepared to deal with this scenario and retry the watch.
### Atomically Creating In-Order Keys
Using `POST` on a directory, you can create keys with key names that are created in-order.
@ -373,7 +421,7 @@ curl http://127.0.0.1:2379/v2/keys/queue -XPOST -d value=Job1
"action": "create",
"node": {
"createdIndex": 6,
"key": "/queue/6",
"key": "/queue/00000000000000000006",
"modifiedIndex": 6,
"value": "Job1"
}
@ -392,7 +440,7 @@ curl http://127.0.0.1:2379/v2/keys/queue -XPOST -d value=Job2
"action": "create",
"node": {
"createdIndex": 29,
"key": "/queue/29",
"key": "/queue/00000000000000000029",
"modifiedIndex": 29,
"value": "Job2"
}
@ -416,13 +464,13 @@ curl -s 'http://127.0.0.1:2379/v2/keys/queue?recursive=true&sorted=true'
"nodes": [
{
"createdIndex": 2,
"key": "/queue/2",
"key": "/queue/00000000000000000002",
"modifiedIndex": 2,
"value": "Job1"
},
{
"createdIndex": 3,
"key": "/queue/3",
"key": "/queue/00000000000000000003",
"modifiedIndex": 3,
"value": "Job2"
}
@ -465,7 +513,7 @@ curl http://127.0.0.1:2379/v2/keys/dir -XPUT -d ttl=30 -d dir=true -d prevExist=
Keys that are under this directory work as usual, but when the directory expires, a watcher on a key under the directory will get an expire event:
```sh
curl 'http://127.0.0.1:2379/v2/keys/dir/asdf?wait=true'
curl 'http://127.0.0.1:2379/v2/keys/dir?wait=true'
```
```json
@ -493,13 +541,15 @@ etcd can be used as a centralized coordination service in a cluster, and `Compar
This command will set the value of a key only if the client-provided conditions are equal to the current conditions.
*Note that `CompareAndSwap` does not work with [directories][directories]. If an attempt is made to `CompareAndSwap` a directory, a 102 "Not a file" error will be returned.*
The current comparable conditions are:
1. `prevValue` - checks the previous value of the key.
2. `prevIndex` - checks the previous modifiedIndex of the key.
3. `prevExist` - checks existence of the key: if `prevExist` is true, it is an `update` request; if prevExist is `false`, it is a `create` request.
3. `prevExist` - checks existence of the key: if `prevExist` is true, it is an `update` request; if `prevExist` is `false`, it is a `create` request.
Here is a simple example.
Let's create a key-value pair first: `foo=one`.
@ -578,6 +628,8 @@ We successfully changed the value from "one" to "two" since we gave the correct
This command will delete a key only if the client-provided conditions are equal to the current conditions.
*Note that `CompareAndDelete` does not work with [directories]. If an attempt is made to `CompareAndDelete` a directory, a 102 "Not a file" error will be returned.*
The current comparable conditions are:
1. `prevValue` - checks the previous value of the key.
@ -1041,6 +1093,7 @@ curl http://127.0.0.1:2379/v2/stats/self
### Store Statistics
The store statistics include information about the operations that this node has handled.
Note that v2 `store Statistics` is stored in-memory. When a member stops, store statistics will reset on restart.
Operations that modify the store's state like create, delete, set and update are seen by the entire cluster and the number will increase on all nodes.
Operations like get and watch are node local and will only be seen on this node.
@ -1070,6 +1123,8 @@ curl http://127.0.0.1:2379/v2/stats/store
## Cluster Config
See the [other etcd APIs][other-apis] for details on the cluster management.
See the [members API][members-api] for details on the cluster management.
[other-apis]: other_apis.md
[directories]: #listing-a-directory
[members-api]: members_api.md
[tuning]: tuning.md

92
Documentation/api_v3.md Normal file
View File

@ -0,0 +1,92 @@
# etcd3 API
TODO: API doc
## Data Model
etcd is designed to reliably store infrequently updated data and provide reliable watch queries. etcd exposes previous versions of key-value pairs to support inexpensive snapshots and watch history events (“time travel queries”). A persistent, multi-version, concurrency-control data model is a good fit for these use cases.
etcd stores data in a multiversion [persistent][persistent-ds] key-value store. The persistent key-value store preserves the previous version of a key-value pair when its value is superseded with new data. The key-value store is effectively immutable; its operations do not update the structure in-place, but instead always generates a new updated structure. All past versions of keys are still accessible and watchable after modification. To prevent the data store from growing indefinitely over time from maintaining old versions, the store may be compacted to shed the oldest versions of superseded data.
### Logical View
The stores logical view is a flat binary key space. The key space has a lexically sorted index on byte string keys so range queries are inexpensive.
The key space maintains multiple revisions. Each atomic mutative operation (e.g., a transaction operation may contain multiple operations) creates a new revision on the key space. All data held by previous revisions remains unchanged. Old versions of key can still be accessed through previous revisions. Likewise, revisions are indexed as well; ranging over revisions with watchers is efficient. If the store is compacted to recover space, revisions before the compact revision will be removed.
A keys lifetime spans a generation. Each key may have one or multiple generations. Creating a key increments the generation of that key, starting at 1 if the key never existed. Deleting a key generates a key tombstone, concluding the keys current generation. Each modification of a key creates a new version of the key. Once a compaction happens, any generation ended before the given revision will be removed and values set before the compaction revision except the latest one will be removed.
### Physical View
etcd stores the physical data as key-value pairs in a persistent [b+tree][b+tree]. Each revision of the stores state only contains the delta from its previous revision to be efficient. A single revision may correspond to multiple keys in the tree.
The key of key-value pair is a 3-tuple (major, sub, type). Major is the store revision holding the key. Sub differentiates among keys within the same revision. Type is an optional suffix for special value (e.g., `t` if the value contains a tombstone). The value of the key-value pair contains the modification from previous revision, thus one delta from previous revision. The b+tree is ordered by key in lexical byte-order. Ranged lookups over revision deltas are fast; this enables quickly finding modifications from one specific revision to another. Compaction removes out-of-date keys-value pairs.
etcd also keeps a secondary in-memory [btree][btree] index to speed up range queries over keys. The keys in the btree index are the keys of the store exposed to user. The value is a pointer to the modification of the persistent b+tree. Compaction removes dead pointers.
## KV API Guarantees
etcd is a consistent and durable key value store with mini-transaction(TODO: link to txn doc when we have it) support. The key value store is exposed through the KV APIs. etcd tries to ensure the strongest consistency and durability guarantees for a distributed system. This specification enumerates the KV API guarantees made by etcd.
### APIs to consider
* Read APIs
* range
* watch
* Write APIs
* put
* delete
* Combination (read-modify-write) APIs
* txn
### etcd Specific Definitions
#### operation completed
An etcd operation is considered complete when it is committed through consensus, and therefore “executed” -- permanently stored -- by the etcd storage engine. The client knows an operation is completed when it receives a response from the etcd server. Note that the client may be uncertain about the status of an operation if it times out, or there is a network disruption between the client and the etcd member. etcd may also abort operations when there is a leader election. etcd does not send `abort` responses to clients outstanding requests in this event.
#### revision
An etcd operation that modifies the key value store is assigned with a single increasing revision. A transaction operation might modifies the key value store multiple times, but only one revision is assigned. The revision attribute of a key value pair that modified by the operation has the same value as the revision of the operation. The revision can be used as a logical clock for key value store. A key value pair that has a larger revision is modified after a key value pair with a smaller revision. Two key value pairs that have the same revision are modified by an operation "concurrently".
### Guarantees Provided
#### Atomicity
All API requests are atomic; an operation either completes entirely or not at all. For watch requests, all events generated by one operation will be in one watch response. Watch never observes partial events for a single operation.
#### Consistency
All API calls ensure [sequential consistency][seq_consistency], the strongest consistency guarantee available from distributed systems. No matter which etcd member server a client makes requests to, a client reads the same events in the same order. If two members complete the same number of operations, the state of the two members is consistent.
For watch operations, etcd guarantees to return the same value for the same key across all members for the same revision. For range operations, etcd has a similar guarantee for [linearized][Linearizability] access; serialized access may be behind the quorum state, so that the later revision is not yet available.
As with all distributed systems, it is impossible for etcd to ensure [strict consistency][strict_consistency]. etcd does not guarantee that it will return to a read the “most recent” value (as measured by a wall clock when a request is completed) available on any cluster member.
#### Isolation
etcd ensures [serializable isolation][serializable_isolation], which is the highest isolation level available in distributed systems. Read operations will never observe any intermediate data.
#### Durability
Any completed operations are durable. All accessible data is also durable data. A read will never return data that has not been made durable.
#### Linearizability
Linearizability (also known as Atomic Consistency or External Consistency) is a consistency level between strict consistency and sequential consistency.
For linearizability, suppose each operation receives a timestamp from a loosely synchronized global clock. Operations are linearized if and only if they always complete as though they were executed in a sequential order and each operation appears to complete in the order specified by the program. Likewise, if an operations timestamp precedes another, that operation must also precede the other operation in the sequence.
For example, consider a client completing a write at time point 1 (*t1*). A client issuing a read at *t2* (for *t2* > *t1*) should receive a value at least as recent as the previous write, completed at *t1*. However, the read might actually complete only by *t3*, and the returned value, current at *t2* when the read began, might be "stale" by *t3*.
etcd does not ensure linearizability for watch operations. Users are expected to verify the revision of watch responses to ensure correct ordering.
etcd ensures linearizability for all other operations by default. Linearizability comes with a cost, however, because linearized requests must go through the Raft consensus process. To obtain lower latencies and higher throughput for read requests, clients can configure a requests consistency mode to `serializable`, which may access stale data with respect to quorum, but removes the performance penalty of linearized accesses' reliance on live consensus.
[persistent-ds]: [https://en.wikipedia.org/wiki/Persistent_data_structure]
[btree]: [https://en.wikipedia.org/wiki/B-tree]
[b+tree]: [https://en.wikipedia.org/wiki/B%2B_tree]
[seq_consistency]: [https://en.wikipedia.org/wiki/Consistency_model#Sequential_consistency]
[strict_consistency]: [https://en.wikipedia.org/wiki/Consistency_model#Strict_consistency]
[serializable_isolation]: [https://en.wikipedia.org/wiki/Isolation_(database_systems)#Serializable]
[Linearizability]: [#Linearizability]

View File

@ -19,7 +19,7 @@ Each role has exact one associated Permission List. An permission list exists fo
The special static ROOT (named `root`) role has a full permissions on all key-value resources, the permission to manage user resources and settings resources. Only the ROOT role has the permission to manage user resources and modify settings resources. The ROOT role is built-in and does not need to be created.
There is also a special GUEST role, named 'guest'. These are the permissions given to unauthenticated requests to etcd. This role will be created automatically, and by default allows access to the full keyspace due to backward compatability. (etcd did not previously authenticate any actions.). This role can be modified by a ROOT role holder at any time, to reduce the capabilities of unauthenticated users.
There is also a special GUEST role, named 'guest'. These are the permissions given to unauthenticated requests to etcd. This role will be created automatically, and by default allows access to the full keyspace due to backward compatibility. (etcd did not previously authenticate any actions.). This role can be modified by a ROOT role holder at any time, to reduce the capabilities of unauthenticated users.
#### Permissions
@ -40,7 +40,7 @@ Specific settings for the cluster as a whole. This can include adding and removi
## v2 Auth
### Basic Auth
We only support [Basic Auth](http://en.wikipedia.org/wiki/Basic_access_authentication) for the first version. Client needs to attach the basic auth to the HTTP Authorization Header.
We only support [Basic Auth][basic-auth] for the first version. Client needs to attach the basic auth to the HTTP Authorization Header.
### Authorization field for operations
Added to requests to /v2/keys, /v2/auth
@ -124,7 +124,7 @@ The User JSON object is formed as follows:
Password is only passed when necessary.
**Get a list of users**
**Get a List of Users**
GET/HEAD /v2/auth/users
@ -137,7 +137,36 @@ GET/HEAD /v2/auth/users
Content-type: application/json
200 Body:
{
"users": ["alice", "bob", "eve"]
"users": [
{
"user": "alice",
"roles": [
{
"role": "root",
"permissions": {
"kv": {
"read": ["*"],
"write": ["*"]
}
}
}
]
},
{
"user": "bob",
"roles": [
{
"role": "guest",
"permissions": {
"kv": {
"read": ["*"],
"write": ["*"]
}
}
}
]
}
]
}
**Get User Details**
@ -155,7 +184,26 @@ GET/HEAD /v2/auth/users/alice
200 Body:
{
"user" : "alice",
"roles" : ["fleet", "etcd"]
"roles" : [
{
"role": "fleet",
"permissions" : {
"kv" : {
"read": [ "/fleet/" ],
"write": [ "/fleet/" ]
}
}
},
{
"role": "etcd",
"permissions" : {
"kv" : {
"read": [ "*" ],
"write": [ "*" ]
}
}
}
]
}
**Create Or Update A User**
@ -213,22 +261,6 @@ A full role structure may look like this. A Permission List structure is used fo
}
```
**Get a list of Roles**
GET/HEAD /v2/auth/roles
Sent Headers:
Authorization: Basic <BasicAuthString>
Possible Status Codes:
200 OK
401 Unauthorized
200 Headers:
Content-type: application/json
200 Body:
{
"roles": ["fleet", "etcd", "quay"]
}
**Get Role Details**
GET/HEAD /v2/auth/roles/fleet
@ -252,6 +284,50 @@ GET/HEAD /v2/auth/roles/fleet
}
}
**Get a list of Roles**
GET/HEAD /v2/auth/roles
Sent Headers:
Authorization: Basic <BasicAuthString>
Possible Status Codes:
200 OK
401 Unauthorized
200 Headers:
Content-type: application/json
200 Body:
{
"roles": [
{
"role": "fleet",
"permissions": {
"kv": {
"read": ["/fleet/"],
"write": ["/fleet/"]
}
}
},
{
"role": "etcd",
"permissions": {
"kv": {
"read": ["*"],
"write": ["*"]
}
}
},
{
"role": "quay",
"permissions": {
"kv": {
"read": ["*"],
"write": ["*"]
}
}
}
]
}
**Create Or Update A Role**
PUT /v2/auth/roles/rkt
@ -432,3 +508,4 @@ PUT /v2/keys/rkt/RktData
Reads and writes outside the prefixes granted will fail with a 401 Unauthorized.
[basic-auth]: https://en.wikipedia.org/wiki/Basic_access_authentication

View File

@ -1,14 +1,12 @@
# Authentication Guide
**NOTE: The authentication feature is considered experimental. We may change workflow without warning in future releases.**
## Overview
Authentication -- having users and roles in etcd -- was added in etcd 2.1. This guide will help you set up basic authentication in etcd.
etcd before 2.1 was a completely open system; anyone with access to the API could change keys. In order to preserve backward compatibility and upgradability, this feature is off by default.
For a full discussion of the RESTful API, see [the authentication API documentation](auth_api.md)
For a full discussion of the RESTful API, see [the authentication API documentation][auth-api]
## Special Users and Roles
@ -94,6 +92,7 @@ Roles are granted access to various parts of the keyspace, a single path at a ti
Reading a path is simple; if the path ends in `*`, that key **and all keys prefixed with it**, are granted to holders of this role. If it does not end in `*`, only that key and that key alone is granted.
Access can be granted as either read, write, or both, as in the following examples:
```
# Give read access to keys under the /foo directory
$ etcdctl role grant myrolename -path '/foo/*' -read
@ -134,7 +133,7 @@ $ etcdctl role remove myrolename
## Enabling authentication
The minimal steps to enabling auth follow. The administrator can set up users and roles before or after enabling authentication, as a matter of preference.
The minimal steps to enabling auth are as follows. The administrator can set up users and roles before or after enabling authentication, as a matter of preference.
Make sure the root user is created:
@ -177,3 +176,5 @@ $ etcdctl -u user get foo
```
Otherwise, all `etcdctl` commands remain the same. Users and roles can still be created and modified, but require authentication by a user with the root role.
[auth-api]: auth_api.md

View File

@ -24,49 +24,6 @@ https://github.com/coreos/etcd/blob/master/Documentation/configuration.md.
The default data dir location has changed from {$hostname}.etcd to {name}.etcd.
## Data Directory Migration
The disk format within the data directory changed with etcd 2.0.
If you run etcd 2.0 on an etcd 0.4 data directory it will automatically migrate the data and start.
You will want to coordinate this upgrade by walking through each of your machines in the cluster, stopping etcd 0.4 and then starting etcd 2.0.
If you would rather manually do the migration, to test it out first in another environment, you can use the [migration tool doc][migrationtooldoc].
[migrationtooldoc]: https://github.com/coreos/etcd/blob/master/tools/etcd-migrate/README.md
## Snapshot Migration
If you are only interested in the data in etcd you can migrate a snapshot of your data from a v0.4.9+ cluster into a new etcd 2.0 cluster using a snapshot migration.
The advantage of this method is that you are directly dumping only the etcd data so you can run your old and new cluster side-by-side, snapshot the data, import it and then point your applications at this cluster.
The disadvantage is that the etcd indexes of your data will change which may confuse applications that use etcd.
To get started get the newest data snapshot from the 0.4.9+ cluster:
```
curl http://cluster.example.com:4001/v2/migration/snapshot > backup.snap
```
Now, import the snapshot into your new cluster:
```
etcdctl -C new_cluster.example.com import --snap backup.snap
```
If you have a large amount of data, you can specify more concurrent works to copy data in parallel by using `-c` flag.
If you have hidden keys to copy, you can use `--hidden` flag to specify.
And the data will quickly copy into the new cluster:
```
entering dir: /
entering dir: /foo
entering dir: /foo/bar
copying key: /foo/bar/1 1
entering dir: /
entering dir: /foo2
entering dir: /foo2/bar2
copying key: /foo2/bar2/2 2
```
## Key-Value API
### Read consistency flag
@ -75,7 +32,7 @@ The consistent flag for read operations is removed in etcd 2.0.0. The normal rea
The read consistency guarantees are:
The consistent read guarantees the sequential consistency within one client that talks to one etcd server. Read/Write from one client to one etcd member should be observed in order. If one client write a value to a etcd server successfully, it should be able to get the value out of the server immediately.
The consistent read guarantees the sequential consistency within one client that talks to one etcd server. Read/Write from one client to one etcd member should be observed in order. If one client write a value to an etcd server successfully, it should be able to get the value out of the server immediately.
Each etcd member will proxy the request to leader and only return the result to user after the result is applied on the local member. Thus after the write succeed, the user is guaranteed to see the value on the member it sent the request to.
@ -103,9 +60,9 @@ A size key needs to be provided inside a [discovery token][discoverytoken].
## HTTP Admin API
`v2/admin` on peer url and `v2/keys/_etcd` are unified under the new [v2/member API][memberapi] to better explain which machines are part of an etcd cluster, and to simplify the keyspace for all your use cases.
`v2/admin` on peer url and `v2/keys/_etcd` are unified under the new [v2/members API][members-api] to better explain which machines are part of an etcd cluster, and to simplify the keyspace for all your use cases.
[memberapi]: other_apis.md
[members-api]: members_api.md
## HTTP Key Value API
- The follower can now transparently proxy write requests to the leader. Clients will no longer see 307 redirections to the leader from etcd.

View File

@ -2,4 +2,17 @@
etcd benchmarks will be published regularly and tracked for each release below:
- [etcd v2.1.0](etcd-2-1-0-benchmarks.md)
- [etcd v2.1.0-alpha][2.1]
- [etcd v2.2.0-rc][2.2]
- [etcd v3 demo][3.0]
# Memory Usage Benchmarks
It records expected memory usage in different scenarios.
- [etcd v2.2.0-rc][2.2-mem]
[2.1]: etcd-2-1-0-alpha-benchmarks.md
[2.2]: etcd-2-2-0-rc-benchmarks.md
[2.2-mem]: etcd-2-2-0-rc-memory-benchmarks.md
[3.0]: etcd-3-demo-benchmarks.md

View File

@ -6,7 +6,7 @@ GCE n1-highcpu-2 machine type
- 1x dedicated slow disk for the OS
- 1.8 GB memory
- 2x CPUs
- etcd version 2.1.0
- etcd version 2.1.0 alpha
## etcd Cluster
@ -14,7 +14,7 @@ GCE n1-highcpu-2 machine type
## Testing
Bootstrap another machine and use benchmark tool [boom](https://github.com/rakyll/boom) to send requests to each etcd member.
Bootstrap another machine and use the [boom HTTP benchmark tool][boom] to send requests to each etcd member. Check the [benchmark hacking guide][hack-benchmark] for detailed instructions.
## Performance
@ -47,3 +47,6 @@ Bootstrap another machine and use benchmark tool [boom](https://github.com/rakyl
| 64 | 256 | all servers | 3260 | 123.8 |
| 256 | 64 | all servers | 1033 | 121.5 |
| 256 | 256 | all servers | 3061 | 119.3 |
[boom]: https://github.com/rakyll/boom
[hack-benchmark]: /hack/benchmark/

View File

@ -0,0 +1,69 @@
# Benchmarking etcd v2.2.0
## Physical Machines
GCE n1-highcpu-2 machine type
- 1x dedicated local SSD mounted as etcd data directory
- 1x dedicated slow disk for the OS
- 1.8 GB memory
- 2x CPUs
## etcd Cluster
3 etcd 2.2.0 members, each runs on a single machine.
Detailed versions:
```
etcd Version: 2.2.0
Git SHA: e4561dd
Go Version: go1.5
Go OS/Arch: linux/amd64
```
## Testing
Bootstrap another machine, outside of the etcd cluster, and run the [`boom` HTTP benchmark tool](https://github.com/rakyll/boom) with a connection reuse patch to send requests to each etcd cluster member. See the [benchmark instructions](../../hack/benchmark/) for the patch and the steps to reproduce our procedures.
The performance is calulated through results of 100 benchmark rounds.
## Performance
### Single Key Read Performance
| key size in bytes | number of clients | target etcd server | average read QPS | read QPS stddev | average 90th Percentile Latency (ms) | latency stddev |
|-------------------|-------------------|--------------------|------------------|-----------------|--------------------------------------|----------------|
| 64 | 1 | leader only | 2303 | 200 | 0.49 | 0.06 |
| 64 | 64 | leader only | 15048 | 685 | 7.60 | 0.46 |
| 64 | 256 | leader only | 14508 | 434 | 29.76 | 1.05 |
| 256 | 1 | leader only | 2162 | 214 | 0.52 | 0.06 |
| 256 | 64 | leader only | 14789 | 792 | 7.69| 0.48 |
| 256 | 256 | leader only | 14424 | 512 | 29.92 | 1.42 |
| 64 | 64 | all servers | 45752 | 2048 | 2.47 | 0.14 |
| 64 | 256 | all servers | 46592 | 1273 | 10.14 | 0.59 |
| 256 | 64 | all servers | 45332 | 1847 | 2.48| 0.12 |
| 256 | 256 | all servers | 46485 | 1340 | 10.18 | 0.74 |
### Single Key Write Performance
| key size in bytes | number of clients | target etcd server | average write QPS | write QPS stddev | average 90th Percentile Latency (ms) | latency stddev |
|-------------------|-------------------|--------------------|------------------|-----------------|--------------------------------------|----------------|
| 64 | 1 | leader only | 55 | 4 | 24.51 | 13.26 |
| 64 | 64 | leader only | 2139 | 125 | 35.23 | 3.40 |
| 64 | 256 | leader only | 4581 | 581 | 70.53 | 10.22 |
| 256 | 1 | leader only | 56 | 4 | 22.37| 4.33 |
| 256 | 64 | leader only | 2052 | 151 | 36.83 | 4.20 |
| 256 | 256 | leader only | 4442 | 560 | 71.59 | 10.03 |
| 64 | 64 | all servers | 1625 | 85 | 58.51 | 5.14 |
| 64 | 256 | all servers | 4461 | 298 | 89.47 | 36.48 |
| 256 | 64 | all servers | 1599 | 94 | 60.11| 6.43 |
| 256 | 256 | all servers | 4315 | 193 | 88.98 | 7.01 |
## Performance Changes
- Because etcd now records metrics for each API call, read QPS performance seems to see a minor decrease in most scenarios. This minimal performance impact was judged a reasonable investment for the breadth of monitoring and debugging information returned.
- Write QPS to cluster leaders seems to be increased by a small margin. This is because the main loop and entry apply loops were decoupled in the etcd raft logic, eliminating several blocks between them.
- Write QPS to all members seems to be increased by a significant margin, because followers now receive the latest commit index sooner, and commit proposals more quickly.

View File

@ -0,0 +1,72 @@
## Physical machines
GCE n1-highcpu-2 machine type
- 1x dedicated local SSD mounted under /var/lib/etcd
- 1x dedicated slow disk for the OS
- 1.8 GB memory
- 2x CPUs
## etcd Cluster
3 etcd 2.2.0-rc members, each runs on a single machine.
Detailed versions:
```
etcd Version: 2.2.0-alpha.1+git
Git SHA: 59a5a7e
Go Version: go1.4.2
Go OS/Arch: linux/amd64
```
Also, we use 3 etcd 2.1.0 alpha-stage members to form cluster to get base performance. etcd's commit head is at [c7146bd5][c7146bd5], which is the same as the one that we use in [etcd 2.1 benchmark][etcd-2.1-benchmark].
## Testing
Bootstrap another machine and use the [boom HTTP benchmark tool][boom] to send requests to each etcd member. Check the [benchmark hacking guide][hack-benchmark] for detailed instructions.
## Performance
### reading one single key
| key size in bytes | number of clients | target etcd server | read QPS | 90th Percentile Latency (ms) |
|-------------------|-------------------|--------------------|----------|---------------|
| 64 | 1 | leader only | 2804 (-5%) | 0.4 (+0%) |
| 64 | 64 | leader only | 17816 (+0%) | 5.7 (-6%) |
| 64 | 256 | leader only | 18667 (-6%) | 20.4 (+2%) |
| 256 | 1 | leader only | 2181 (-15%) | 0.5 (+25%) |
| 256 | 64 | leader only | 17435 (-7%) | 6.0 (+9%) |
| 256 | 256 | leader only | 18180 (-8%) | 21.3 (+3%) |
| 64 | 64 | all servers | 46965 (-4%) | 2.1 (+0%) |
| 64 | 256 | all servers | 55286 (-6%) | 7.4 (+6%) |
| 256 | 64 | all servers | 46603 (-6%) | 2.1 (+5%) |
| 256 | 256 | all servers | 55291 (-6%) | 7.3 (+4%) |
### writing one single key
| key size in bytes | number of clients | target etcd server | write QPS | 90th Percentile Latency (ms) |
|-------------------|-------------------|--------------------|-----------|---------------|
| 64 | 1 | leader only | 76 (+22%) | 19.4 (-15%) |
| 64 | 64 | leader only | 2461 (+45%) | 31.8 (-32%) |
| 64 | 256 | leader only | 4275 (+1%) | 69.6 (-10%) |
| 256 | 1 | leader only | 64 (+20%) | 16.7 (-30%) |
| 256 | 64 | leader only | 2385 (+30%) | 31.5 (-19%) |
| 256 | 256 | leader only | 4353 (-3%) | 74.0 (+9%) |
| 64 | 64 | all servers | 2005 (+81%) | 49.8 (-55%) |
| 64 | 256 | all servers | 4868 (+35%) | 81.5 (-40%) |
| 256 | 64 | all servers | 1925 (+72%) | 47.7 (-59%) |
| 256 | 256 | all servers | 4975 (+36%) | 70.3 (-36%) |
### performance changes explanation
- read QPS in most scenarios is decreased by 5~8%. The reason is that etcd records store metrics for each store operation. The metrics is important for monitoring and debugging, so this is acceptable.
- write QPS to leader is increased by 20~30%. This is because we decouple raft main loop and entry apply loop, which avoids them blocking each other.
- write QPS to all servers is increased by 30~80% because follower could receive latest commit index earlier and commit proposals faster.
[boom]: https://github.com/rakyll/boom
[c7146bd5]: https://github.com/coreos/etcd/commits/c7146bd5f2c73716091262edc638401bb8229144
[etcd-2.1-benchmark]: etcd-2-1-0-alpha-benchmarks.md
[hack-benchmark]: /hack/benchmark/

View File

@ -0,0 +1,47 @@
## Physical machine
GCE n1-standard-2 machine type
- 1x dedicated local SSD mounted under /var/lib/etcd
- 1x dedicated slow disk for the OS
- 7.5 GB memory
- 2x CPUs
## etcd
```
etcd Version: 2.2.0-rc.0+git
Git SHA: 103cb5c
Go Version: go1.5
Go OS/Arch: linux/amd64
```
## Testing
Start 3-member etcd cluster, each of which uses 2 cores.
The length of key name is always 64 bytes, which is a reasonable length of average key bytes.
## Memory Maximal Usage
- etcd may use maximal memory if one follower is dead and the leader keeps sending snapshots.
- `max RSS` is the maximal memory usage recorded in 3 runs.
| value bytes | key number | data size(MB) | max RSS(MB) | max RSS/data rate on leader |
|-------------|-------------|---------------|-------------|-----------------------------|
| 128 | 50000 | 6 | 433 | 72x |
| 128 | 100000 | 12 | 659 | 54x |
| 128 | 200000 | 24 | 1466 | 61x |
| 1024 | 50000 | 48 | 1253 | 26x |
| 1024 | 100000 | 96 | 2344 | 24x |
| 1024 | 200000 | 192 | 4361 | 22x |
## Data Size Threshold
- When etcd reaches data size threshold, it may trigger leader election easily and drop part of proposals.
- At most cases, etcd cluster should work smoothly if it doesn't hit the threshold. If it doesn't work well due to insufficient resources, you need to decrease its data size.
| value bytes | key number limitation | suggested data size threshold(MB) | consumed RSS(MB) |
|-------------|-----------------------|-----------------------------------|------------------|
| 128 | 400K | 48 | 2400 |
| 1024 | 300K | 292 | 6500 |

View File

@ -0,0 +1,42 @@
## Physical machines
GCE n1-highcpu-2 machine type
- 1x dedicated local SSD mounted under /var/lib/etcd
- 1x dedicated slow disk for the OS
- 1.8 GB memory
- 2x CPUs
- etcd version 2.2.0
## etcd Cluster
1 etcd member running in v3 demo mode
## Testing
Use [etcd v3 benchmark tool][etcd-v3-benchmark].
## Performance
### reading one single key
| key size in bytes | number of clients | read QPS | 90th Percentile Latency (ms) |
|-------------------|-------------------|----------|---------------|
| 256 | 1 | 2716 | 0.4 |
| 256 | 64 | 16623 | 6.1 |
| 256 | 256 | 16622 | 21.7 |
The performance is nearly the same as the one with empty server handler.
### reading one single key after putting
| key size in bytes | number of clients | read QPS | 90th Percentile Latency (ms) |
|-------------------|-------------------|----------|---------------|
| 256 | 1 | 2269 | 0.5 |
| 256 | 64 | 13582 | 8.6 |
| 256 | 256 | 13262 | 47.5 |
The performance with empty server handler is not affected by one put. So the
performance downgrade should be caused by storage package.
[etcd-v3-benchmark]: /tools/benchmark/

View File

@ -0,0 +1,77 @@
# Watch Memory Usage Benchmark
*NOTE*: The watch features are under active development, and their memory usage may change as that development progresses. We do not expect it to significantly increase beyond the figures stated below.
A primary goal of etcd is supporting a very large number of watchers doing a massively large amount of watching. etcd aims to support O(10k) clients, O(100K) watch streams (O(10) streams per client) and O(10M) total watchings (O(100) watching per stream). The memory consumed by each individual watching accounts for the largest portion of etcd's overall usage, and is therefore the focus of current and future optimizations.
Three related components of etcd watch consume physical memory: each `grpc.Conn`, each watch stream, and each instance of the watching activity. `grpc.Conn` maintains the actual TCP connection and other gRPC connection state. Each `grpc.Conn` consumes O(10kb) of memory, and might have multiple watch streams attached.
Each watch stream is an independent HTTP2 connection which consumes another O(10kb) of memory.
Multiple watchings might share one watch stream.
Watching is the actual struct that tracks the changes on the key-value store. Each watching should only consume < O(1kb).
```
+-------+
| watch |
+---------> | foo |
| +-------+
+------+-----+
| stream |
+--------------> | |
| +------+-----+ +-------+
| | | watch |
| +---------> | bar |
+-----+------+ +-------+
| | +------------+
| conn +-------> | stream |
| | | |
+-----+------+ +------------+
|
|
|
| +------------+
+--------------> | stream |
| |
+------------+
```
The theoretical memory consumption of watch can be approximated with the formula:
`memory = c1 * number_of_conn + c2 * avg_number_of_stream_per_conn + c3 * avg_number_of_watch_stream`
## Testing Environment
etcd version
- git head https://github.com/coreos/etcd/commit/185097ffaa627b909007e772c175e8fefac17af3
GCE n1-standard-2 machine type
- 7.5 GB memory
- 2x CPUs
## Overall memory usage
The overall memory usage captures how much [RSS][rss] etcd consumes with the client watchers. While the result may vary by as much as 10%, it is still meaningful, since the goal is to learn about the rough memory usage and the pattern of allocations.
With the benchmark result, we can calculate roughly that `c1 = 17kb`, `c2 = 18kb` and `c3 = 350bytes`. So each additional client connection consumes 17kb of memory and each additional stream consumes 18kb of memory, and each additional watching only cause 350bytes. A single etcd server can maintain millions of watchings with a few GB of memory in normal case.
| clients | streams per client | watchings per stream | total watching | memory usage |
|---------|---------|-----------|----------------|--------------|
| 1k | 1 | 1 | 1k | 50MB |
| 2k | 1 | 1 | 2k | 90MB |
| 5k | 1 | 1 | 5k | 200MB |
| 1k | 10 | 1 | 10k | 217MB |
| 2k | 10 | 1 | 20k | 417MB |
| 5k | 10 | 1 | 50k | 980MB |
| 1k | 50 | 1 | 50k | 1001MB |
| 2k | 50 | 1 | 100k | 1960MB |
| 5k | 50 | 1 | 250k | 4700MB |
| 1k | 50 | 10 | 500k | 1171MB |
| 2k | 50 | 10 | 1M | 2371MB |
| 5k | 50 | 10 | 2.5M | 5710MB |
| 1k | 50 | 100 | 5M | 2380MB |
| 2k | 50 | 100 | 10M | 4672MB |
| 5k | 50 | 100 | 50M | *OOM* |
[rss]: https://en.wikipedia.org/wiki/Resident_set_size

View File

@ -0,0 +1,98 @@
# Storage Memory Usage Benchmark
<!---todo: link storage to storage design doc-->
Two components of etcd storage consume physical memory. The etcd process allocates an *in-memory index* to speed key lookup. The process's *page cache*, managed by the operating system, stores recently-accessed data from disk for quick re-use.
The in-memory index holds all the keys in a [B-tree][btree] data structure, along with pointers to the on-disk data (the values). Each key in the B-tree may contain multiple pointers, pointing to different versions of its values. The theoretical memory consumption of the in-memory index can hence be approximated with the formula:
`N * (c1 + avg_key_size) + N * (avg_versions_of_key) * (c2 + size_of_pointer)`
where `c1` is the key metadata overhead and `c2` is the version metadata overhead.
The graph shows the detailed structure of the in-memory index B-tree.
```
In mem index
+------------+
| key || ... |
+--------------+ | || |
| | +------------+
| | | v1 || ... |
| disk <----------------| || | Tree Node
| | +------------+
| | | v2 || ... |
| <----------------+ || |
| | +------------+
+--------------+ +-----+ | | |
| | | | |
| +------------+
|
|
^
------+
| ... |
| |
+-----+
| ... | Tree Node
| |
+-----+
| ... |
| |
------+
```
[Page cache memory][pagecache] is managed by the operating system and is not covered in detail in this document.
## Testing Environment
etcd version
- git head https://github.com/coreos/etcd/commit/776e9fb7be7eee5e6b58ab977c8887b4fe4d48db
GCE n1-standard-2 machine type
- 7.5 GB memory
- 2x CPUs
## In-memory index memory usage
In this test, we only benchmark the memory usage of the in-memory index. The goal is to find `c1` and `c2` mentioned above and to understand the hard limit of memory consumption of the storage.
We calculate the memory usage consumption via the Go runtime.ReadMemStats. We calculate the total allocated bytes difference before creating the index and after creating the index. It cannot perfectly reflect the memory usage of the in-memory index itself but can show the rough consumption pattern.
| N | versions | key size | memory usage |
|------|----------|----------|--------------|
| 100K | 1 | 64bytes | 22MB |
| 100K | 5 | 64bytes | 39MB |
| 1M | 1 | 64bytes | 218MB |
| 1M | 5 | 64bytes | 432MB |
| 100K | 1 | 256bytes | 41MB |
| 100K | 5 | 256bytes | 65MB |
| 1M | 1 | 256bytes | 409MB |
| 1M | 5 | 256bytes | 506MB |
Based on the result, we can calculate `c1=120bytes`, `c2=30bytes`. We only need two sets of data to calculate `c1` and `c2`, since they are the only unknown variable in the formula. The `c1=120bytes` and `c2=30bytes` are the average value of the 4 sets of `c1` and `c2` we calculated. The key metadata overhead is still relatively nontrivial (50%) for small key-value pairs. However, this is a significant improvement over the old store, which had at least 1000% overhead.
## Overall memory usage
The overall memory usage captures how much RSS etcd consumes with the storage. The value size should have very little impact on the overall memory usage of etcd, since we keep values on disk and only retain hot values in memory, managed by the OS page cache.
| N | versions | key size | value size | memory usage |
|------|----------|----------|------------|--------------|
| 100K | 1 | 64bytes | 256bytes | 40MB |
| 100K | 5 | 64bytes | 256bytes | 89MB |
| 1M | 1 | 64bytes | 256bytes | 470MB |
| 1M | 5 | 64bytes | 256bytes | 880MB |
| 100K | 1 | 64bytes | 1KB | 102MB |
| 100K | 5 | 64bytes | 1KB | 164MB |
| 1M | 1 | 64bytes | 1KB | 587MB |
| 1M | 5 | 64bytes | 1KB | 836MB |
Based on the result, we know the value size does not significantly impact the memory consumption. There is some minor increase due to more data held in the OS page cache.
[btree]: https://en.wikipedia.org/wiki/B-tree
[pagecache]: https://en.wikipedia.org/wiki/Page_cache

View File

@ -1,13 +1,13 @@
## Branch Management
# Branch Management
### Guide
## Guide
- New development occurs on the [master branch](https://github.com/coreos/etcd/tree/master)
- Master branch should always have a green build!
- Backwards-compatible bug fixes should target the master branch and subsequently be ported to stable branches
- Once the master branch is ready for release, it will be tagged and become the new stable branch.
* New development occurs on the [master branch][master].
* Master branch should always have a green build!
* Backwards-compatible bug fixes should target the master branch and subsequently be ported to stable branches.
* Once the master branch is ready for release, it will be tagged and become the new stable branch.
The etcd team has adopted a _rolling release model_ and supports one stable version of etcd.
The etcd team has adopted a *rolling release model* and supports one stable version of etcd.
### Master branch
@ -22,3 +22,5 @@ Before the release of the next stable version, feature PRs will be frozen. We wi
All branches with prefix `release-` are considered _stable_ branches.
After every minor release (http://semver.org/), we will have a new stable branch for that release. We will keep fixing the backwards-compatible bugs for the latest stable release, but not previous releases. The _patch_ release, incorporating any bug fixes, will be once every two weeks, given any patches.
[master]: https://github.com/coreos/etcd/tree/master

View File

@ -4,7 +4,7 @@
Starting an etcd cluster statically requires that each member knows another in the cluster. In a number of cases, you might not know the IPs of your cluster members ahead of time. In these cases, you can bootstrap an etcd cluster with the help of a discovery service.
Once an etcd cluster is up and running, adding or removing members is done via [runtime reconfiguration](runtime-configuration.md).
Once an etcd cluster is up and running, adding or removing members is done via [runtime reconfiguration][runtime-conf]. To better understand the design behind runtime reconfiguration, we suggest you read [the runtime configuration design document][runtime-reconf-design].
This guide will cover the following mechanisms for bootstrapping an etcd cluster:
@ -30,57 +30,59 @@ ETCD_INITIAL_CLUSTER_STATE=new
```
```
-initial-cluster infra0=http://10.0.1.10:2380,infra1=http://10.0.1.11:2380,infra2=http://10.0.1.12:2380 \
-initial-cluster-state new
--initial-cluster infra0=http://10.0.1.10:2380,infra1=http://10.0.1.11:2380,infra2=http://10.0.1.12:2380 \
--initial-cluster-state new
```
Note that the URLs specified in `initial-cluster` are the _advertised peer URLs_, i.e. they should match the value of `initial-advertise-peer-urls` on the respective nodes.
If you are spinning up multiple clusters (or creating and destroying a single cluster) with same configuration for testing purpose, it is highly recommended that you specify a unique `initial-cluster-token` for the different clusters. By doing this, etcd can generate unique cluster IDs and member IDs for the clusters even if they otherwise have the exact same configuration. This can protect you from cross-cluster-interaction, which might corrupt your clusters.
etcd listens on [`listen-client-urls`][conf-listen-client] to accept client traffic. etcd member advertises the URLs specified in [`advertise-client-urls`][conf-adv-client] to other members, proxies, clients. Please make sure the `advertise-client-urls` are reachable from intended clients. A common mistake is setting `advertise-client-urls` to localhost or leave it as default when you want the remote clients to reach etcd.
On each machine you would start etcd with these flags:
```
$ etcd -name infra0 -initial-advertise-peer-urls http://10.0.1.10:2380 \
-listen-peer-urls http://10.0.1.10:2380 \
-listen-client-urls http://10.0.1.10:2379,http://127.0.0.1:2379 \
-advertise-client-urls http://10.0.1.10:2379 \
-initial-cluster-token etcd-cluster-1 \
-initial-cluster infra0=http://10.0.1.10:2380,infra1=http://10.0.1.11:2380,infra2=http://10.0.1.12:2380 \
-initial-cluster-state new
$ etcd --name infra0 --initial-advertise-peer-urls http://10.0.1.10:2380 \
--listen-peer-urls http://10.0.1.10:2380 \
--listen-client-urls http://10.0.1.10:2379,http://127.0.0.1:2379 \
--advertise-client-urls http://10.0.1.10:2379 \
--initial-cluster-token etcd-cluster-1 \
--initial-cluster infra0=http://10.0.1.10:2380,infra1=http://10.0.1.11:2380,infra2=http://10.0.1.12:2380 \
--initial-cluster-state new
```
```
$ etcd -name infra1 -initial-advertise-peer-urls http://10.0.1.11:2380 \
-listen-peer-urls http://10.0.1.11:2380 \
-listen-client-urls http://10.0.1.11:2379,http://127.0.0.1:2379 \
-advertise-client-urls http://10.0.1.11:2379 \
-initial-cluster-token etcd-cluster-1 \
-initial-cluster infra0=http://10.0.1.10:2380,infra1=http://10.0.1.11:2380,infra2=http://10.0.1.12:2380 \
-initial-cluster-state new
$ etcd --name infra1 --initial-advertise-peer-urls http://10.0.1.11:2380 \
--listen-peer-urls http://10.0.1.11:2380 \
--listen-client-urls http://10.0.1.11:2379,http://127.0.0.1:2379 \
--advertise-client-urls http://10.0.1.11:2379 \
--initial-cluster-token etcd-cluster-1 \
--initial-cluster infra0=http://10.0.1.10:2380,infra1=http://10.0.1.11:2380,infra2=http://10.0.1.12:2380 \
--initial-cluster-state new
```
```
$ etcd -name infra2 -initial-advertise-peer-urls http://10.0.1.12:2380 \
-listen-peer-urls http://10.0.1.12:2380 \
-listen-client-urls http://10.0.1.12:2379,http://127.0.0.1:2379 \
-advertise-client-urls http://10.0.1.12:2379 \
-initial-cluster-token etcd-cluster-1 \
-initial-cluster infra0=http://10.0.1.10:2380,infra1=http://10.0.1.11:2380,infra2=http://10.0.1.12:2380 \
-initial-cluster-state new
$ etcd --name infra2 --initial-advertise-peer-urls http://10.0.1.12:2380 \
--listen-peer-urls http://10.0.1.12:2380 \
--listen-client-urls http://10.0.1.12:2379,http://127.0.0.1:2379 \
--advertise-client-urls http://10.0.1.12:2379 \
--initial-cluster-token etcd-cluster-1 \
--initial-cluster infra0=http://10.0.1.10:2380,infra1=http://10.0.1.11:2380,infra2=http://10.0.1.12:2380 \
--initial-cluster-state new
```
The command line parameters starting with `-initial-cluster` will be ignored on subsequent runs of etcd. You are free to remove the environment variables or command line flags after the initial bootstrap process. If you need to make changes to the configuration later (for example, adding or removing members to/from the cluster), see the [runtime configuration](runtime-configuration.md) guide.
The command line parameters starting with `--initial-cluster` will be ignored on subsequent runs of etcd. You are free to remove the environment variables or command line flags after the initial bootstrap process. If you need to make changes to the configuration later (for example, adding or removing members to/from the cluster), see the [runtime configuration][runtime-conf] guide.
### Error Cases
In the following example, we have not included our new host in the list of enumerated nodes. If this is a new cluster, the node _must_ be added to the list of initial cluster members.
```
$ etcd -name infra1 -initial-advertise-peer-urls http://10.0.1.11:2380 \
-listen-peer-urls https://10.0.1.11:2380 \
-listen-client-urls http://10.0.1.11:2379,http://127.0.0.1:2379 \
-advertise-client-urls http://10.0.1.11:2379 \
-initial-cluster infra0=http://10.0.1.10:2380 \
-initial-cluster-state new
$ etcd --name infra1 --initial-advertise-peer-urls http://10.0.1.11:2380 \
--listen-peer-urls https://10.0.1.11:2380 \
--listen-client-urls http://10.0.1.11:2379,http://127.0.0.1:2379 \
--advertise-client-urls http://10.0.1.11:2379 \
--initial-cluster infra0=http://10.0.1.10:2380 \
--initial-cluster-state new
etcd: infra1 not listed in the initial cluster config
exit 1
```
@ -88,12 +90,12 @@ exit 1
In this example, we are attempting to map a node (infra0) on a different address (127.0.0.1:2380) than its enumerated address in the cluster list (10.0.1.10:2380). If this node is to listen on multiple addresses, all addresses _must_ be reflected in the "initial-cluster" configuration directive.
```
$ etcd -name infra0 -initial-advertise-peer-urls http://127.0.0.1:2380 \
-listen-peer-urls http://10.0.1.10:2380 \
-listen-client-urls http://10.0.1.10:2379,http://127.0.0.1:2379 \
-advertise-client-urls http://10.0.1.10:2379 \
-initial-cluster infra0=http://10.0.1.10:2380,infra1=http://10.0.1.11:2380,infra2=http://10.0.1.12:2380 \
-initial-cluster-state=new
$ etcd --name infra0 --initial-advertise-peer-urls http://127.0.0.1:2380 \
--listen-peer-urls http://10.0.1.10:2380 \
--listen-client-urls http://10.0.1.10:2379,http://127.0.0.1:2379 \
--advertise-client-urls http://10.0.1.10:2379 \
--initial-cluster infra0=http://10.0.1.10:2380,infra1=http://10.0.1.11:2380,infra2=http://10.0.1.12:2380 \
--initial-cluster-state=new
etcd: error setting up initial cluster: infra0 has different advertised URLs in the cluster and advertised peer URLs list
exit 1
```
@ -101,12 +103,12 @@ exit 1
If you configure a peer with a different set of configuration and attempt to join this cluster you will get a cluster ID mismatch and etcd will exit.
```
$ etcd -name infra3 -initial-advertise-peer-urls http://10.0.1.13:2380 \
-listen-peer-urls http://10.0.1.13:2380 \
-listen-client-urls http://10.0.1.13:2379,http://127.0.0.1:2379 \
-advertise-client-urls http://10.0.1.13:2379 \
-initial-cluster infra0=http://10.0.1.10:2380,infra1=http://10.0.1.11:2380,infra3=http://10.0.1.13:2380 \
-initial-cluster-state=new
$ etcd --name infra3 --initial-advertise-peer-urls http://10.0.1.13:2380 \
--listen-peer-urls http://10.0.1.13:2380 \
--listen-client-urls http://10.0.1.13:2379,http://127.0.0.1:2379 \
--advertise-client-urls http://10.0.1.13:2379 \
--initial-cluster infra0=http://10.0.1.10:2380,infra1=http://10.0.1.11:2380,infra3=http://10.0.1.13:2380 \
--initial-cluster-state=new
etcd: conflicting cluster ID to the target cluster (c6ab534d07e8fcc4 != bc25ea2a74fb18b0). Exiting.
exit 1
```
@ -122,13 +124,13 @@ There two methods that can be used for discovery:
### etcd Discovery
To better understand the design about discovery service protocol, we suggest you read [this][discovery-proto].
#### Lifetime of a Discovery URL
A discovery URL identifies a unique etcd cluster. Instead of reusing a discovery URL, you should always create discovery URLs for new clusters.
Moreover, discovery URLs should ONLY be used for the initial bootstrapping of a cluster. To change cluster membership after the cluster is already running, see the [runtime reconfiguration][runtime] guide.
[runtime]: runtime-configuration.md
Moreover, discovery URLs should ONLY be used for the initial bootstrapping of a cluster. To change cluster membership after the cluster is already running, see the [runtime reconfiguration][runtime-conf] guide.
#### Custom etcd Discovery Service
@ -144,28 +146,30 @@ If you bootstrap an etcd cluster using discovery service with more than the expe
The URL you will use in this case will be `https://myetcd.local/v2/keys/discovery/6c007a14875d53d9bf0ef5a6fc0257c817f0fb83` and the etcd members will use the `https://myetcd.local/v2/keys/discovery/6c007a14875d53d9bf0ef5a6fc0257c817f0fb83` directory for registration as they start.
**Each member must have a different name flag specified. `Hostname` or `machine-id` can be a good choice. Or discovery will fail due to duplicated name.**
Now we start etcd with those relevant flags for each member:
```
$ etcd -name infra0 -initial-advertise-peer-urls http://10.0.1.10:2380 \
-listen-peer-urls http://10.0.1.10:2380 \
-listen-client-urls http://10.0.1.10:2379,http://127.0.0.1:2379 \
-advertise-client-urls http://10.0.1.10:2379 \
-discovery https://myetcd.local/v2/keys/discovery/6c007a14875d53d9bf0ef5a6fc0257c817f0fb83
$ etcd --name infra0 --initial-advertise-peer-urls http://10.0.1.10:2380 \
--listen-peer-urls http://10.0.1.10:2380 \
--listen-client-urls http://10.0.1.10:2379,http://127.0.0.1:2379 \
--advertise-client-urls http://10.0.1.10:2379 \
--discovery https://myetcd.local/v2/keys/discovery/6c007a14875d53d9bf0ef5a6fc0257c817f0fb83
```
```
$ etcd -name infra1 -initial-advertise-peer-urls http://10.0.1.11:2380 \
-listen-peer-urls http://10.0.1.11:2380 \
-listen-client-urls http://10.0.1.11:2379,http://127.0.0.1:2379 \
-advertise-client-urls http://10.0.1.11:2379 \
-discovery https://myetcd.local/v2/keys/discovery/6c007a14875d53d9bf0ef5a6fc0257c817f0fb83
$ etcd --name infra1 --initial-advertise-peer-urls http://10.0.1.11:2380 \
--listen-peer-urls http://10.0.1.11:2380 \
--listen-client-urls http://10.0.1.11:2379,http://127.0.0.1:2379 \
--advertise-client-urls http://10.0.1.11:2379 \
--discovery https://myetcd.local/v2/keys/discovery/6c007a14875d53d9bf0ef5a6fc0257c817f0fb83
```
```
$ etcd -name infra2 -initial-advertise-peer-urls http://10.0.1.12:2380 \
-listen-peer-urls http://10.0.1.12:2380 \
-listen-client-urls http://10.0.1.12:2379,http://127.0.0.1:2379 \
-advertise-client-urls http://10.0.1.12:2379 \
-discovery https://myetcd.local/v2/keys/discovery/6c007a14875d53d9bf0ef5a6fc0257c817f0fb83
$ etcd --name infra2 --initial-advertise-peer-urls http://10.0.1.12:2380 \
--listen-peer-urls http://10.0.1.12:2380 \
--listen-client-urls http://10.0.1.12:2379,http://127.0.0.1:2379 \
--advertise-client-urls http://10.0.1.12:2379 \
--discovery https://myetcd.local/v2/keys/discovery/6c007a14875d53d9bf0ef5a6fc0257c817f0fb83
```
This will cause each member to register itself with the custom etcd discovery service and begin the cluster once all machines have been registered.
@ -183,9 +187,6 @@ This will create the cluster with an initial expected size of 3 members. If you
If you bootstrap an etcd cluster using discovery service with more than the expected number of etcd members, the extra etcd processes will [fall back][fall-back] to being [proxies][proxy] by default.
[fall-back]: proxy.md#fallback-to-proxy-mode-with-discovery-service
[proxy]: proxy.md
```
ETCD_DISCOVERY=https://discovery.etcd.io/3e86b59982e49066c5d813af1c2e2579cbf573de
```
@ -194,28 +195,30 @@ ETCD_DISCOVERY=https://discovery.etcd.io/3e86b59982e49066c5d813af1c2e2579cbf573d
-discovery https://discovery.etcd.io/3e86b59982e49066c5d813af1c2e2579cbf573de
```
**Each member must have a different name flag specified. `Hostname` or `machine-id` can be a good choice. Or discovery will fail due to duplicated name.**
Now we start etcd with those relevant flags for each member:
```
$ etcd -name infra0 -initial-advertise-peer-urls http://10.0.1.10:2380 \
-listen-peer-urls http://10.0.1.10:2380 \
-listen-client-urls http://10.0.1.10:2379,http://127.0.0.1:2379 \
-advertise-client-urls http://10.0.1.10:2379 \
-discovery https://discovery.etcd.io/3e86b59982e49066c5d813af1c2e2579cbf573de
$ etcd --name infra0 --initial-advertise-peer-urls http://10.0.1.10:2380 \
--listen-peer-urls http://10.0.1.10:2380 \
--listen-client-urls http://10.0.1.10:2379,http://127.0.0.1:2379 \
--advertise-client-urls http://10.0.1.10:2379 \
--discovery https://discovery.etcd.io/3e86b59982e49066c5d813af1c2e2579cbf573de
```
```
$ etcd -name infra1 -initial-advertise-peer-urls http://10.0.1.11:2380 \
-listen-peer-urls http://10.0.1.11:2380 \
-listen-client-urls http://10.0.1.11:2379,http://127.0.0.1:2379 \
-advertise-client-urls http://10.0.1.11:2379 \
-discovery https://discovery.etcd.io/3e86b59982e49066c5d813af1c2e2579cbf573de
$ etcd --name infra1 --initial-advertise-peer-urls http://10.0.1.11:2380 \
--listen-peer-urls http://10.0.1.11:2380 \
--listen-client-urls http://10.0.1.11:2379,http://127.0.0.1:2379 \
--advertise-client-urls http://10.0.1.11:2379 \
--discovery https://discovery.etcd.io/3e86b59982e49066c5d813af1c2e2579cbf573de
```
```
$ etcd -name infra2 -initial-advertise-peer-urls http://10.0.1.12:2380 \
-listen-peer-urls http://10.0.1.12:2380 \
-listen-client-urls http://10.0.1.12:2379,http://127.0.0.1:2379 \
-advertise-client-urls http://10.0.1.12:2379 \
-discovery https://discovery.etcd.io/3e86b59982e49066c5d813af1c2e2579cbf573de
$ etcd --name infra2 --initial-advertise-peer-urls http://10.0.1.12:2380 \
--listen-peer-urls http://10.0.1.12:2380 \
--listen-client-urls http://10.0.1.12:2379,http://127.0.0.1:2379 \
--advertise-client-urls http://10.0.1.12:2379 \
--discovery https://discovery.etcd.io/3e86b59982e49066c5d813af1c2e2579cbf573de
```
This will cause each member to register itself with the discovery service and begin the cluster once all members have been registered.
@ -228,11 +231,11 @@ You can use the environment variable `ETCD_DISCOVERY_PROXY` to cause etcd to use
```
$ etcd -name infra0 -initial-advertise-peer-urls http://10.0.1.10:2380 \
-listen-peer-urls http://10.0.1.10:2380 \
-listen-client-urls http://10.0.1.10:2379,http://127.0.0.1:2379 \
-advertise-client-urls http://10.0.1.10:2379 \
-discovery https://discovery.etcd.io/3e86b59982e49066c5d813af1c2e2579cbf573de
$ etcd --name infra0 --initial-advertise-peer-urls http://10.0.1.10:2380 \
--listen-peer-urls http://10.0.1.10:2380 \
--listen-client-urls http://10.0.1.10:2379,http://127.0.0.1:2379 \
--advertise-client-urls http://10.0.1.10:2379 \
--discovery https://discovery.etcd.io/3e86b59982e49066c5d813af1c2e2579cbf573de
etcd: error: the cluster doesnt have a size configuration value in https://discovery.etcd.io/3e86b59982e49066c5d813af1c2e2579cbf573de/_config
exit 1
```
@ -242,12 +245,12 @@ exit 1
This error will occur if the discovery cluster already has the configured number of members, and `discovery-fallback` is explicitly disabled
```
$ etcd -name infra0 -initial-advertise-peer-urls http://10.0.1.10:2380 \
-listen-peer-urls http://10.0.1.10:2380 \
-listen-client-urls http://10.0.1.10:2379,http://127.0.0.1:2379 \
-advertise-client-urls http://10.0.1.10:2379 \
-discovery https://discovery.etcd.io/3e86b59982e49066c5d813af1c2e2579cbf573de \
-discovery-fallback exit
$ etcd --name infra0 --initial-advertise-peer-urls http://10.0.1.10:2380 \
--listen-peer-urls http://10.0.1.10:2380 \
--listen-client-urls http://10.0.1.10:2379,http://127.0.0.1:2379 \
--advertise-client-urls http://10.0.1.10:2379 \
--discovery https://discovery.etcd.io/3e86b59982e49066c5d813af1c2e2579cbf573de \
--discovery-fallback exit
etcd: discovery: cluster is full
exit 1
```
@ -258,17 +261,17 @@ This is a harmless warning notifying you that the discovery URL will be
ignored on this machine.
```
$ etcd -name infra0 -initial-advertise-peer-urls http://10.0.1.10:2380 \
-listen-peer-urls http://10.0.1.10:2380 \
-listen-client-urls http://10.0.1.10:2379,http://127.0.0.1:2379 \
-advertise-client-urls http://10.0.1.10:2379 \
-discovery https://discovery.etcd.io/3e86b59982e49066c5d813af1c2e2579cbf573de
$ etcd --name infra0 --initial-advertise-peer-urls http://10.0.1.10:2380 \
--listen-peer-urls http://10.0.1.10:2380 \
--listen-client-urls http://10.0.1.10:2379,http://127.0.0.1:2379 \
--advertise-client-urls http://10.0.1.10:2379 \
--discovery https://discovery.etcd.io/3e86b59982e49066c5d813af1c2e2579cbf573de
etcdserver: discovery token ignored since a cluster has already been initialized. Valid log found at /var/lib/etcd
```
### DNS Discovery
DNS [SRV records](http://www.ietf.org/rfc/rfc2052.txt) can be used as a discovery mechanism.
DNS [SRV records][rfc-srv] can be used as a discovery mechanism.
The `-discovery-srv` flag can be used to set the DNS domain name where the discovery SRV records can be found.
The following DNS SRV records are looked up in the listed order:
@ -277,91 +280,107 @@ The following DNS SRV records are looked up in the listed order:
If `_etcd-server-ssl._tcp.example.com` is found then etcd will attempt the bootstrapping process over SSL.
To help clients discover the etcd cluster, the following DNS SRV records are looked up in the listed order:
* _etcd-client._tcp.example.com
* _etcd-client-ssl._tcp.example.com
If `_etcd-client-ssl._tcp.example.com` is found, clients will attempt to communicate with the etcd cluster over SSL.
#### Create DNS SRV records
```
$ dig +noall +answer SRV _etcd-server._tcp.example.com
_etcd-server._tcp.example.com. 300 IN SRV 0 0 2380 infra0.example.com.
_etcd-server._tcp.example.com. 300 IN SRV 0 0 2380 infra1.example.com.
_etcd-server._tcp.example.com. 300 IN SRV 0 0 2380 infra2.example.com.
_etcd-server._tcp.example.com. 300 IN SRV 0 0 2380 infra0.example.com.
_etcd-server._tcp.example.com. 300 IN SRV 0 0 2380 infra1.example.com.
_etcd-server._tcp.example.com. 300 IN SRV 0 0 2380 infra2.example.com.
```
```
$ dig +noall +answer SRV _etcd-client._tcp.example.com
_etcd-client._tcp.example.com. 300 IN SRV 0 0 2379 infra0.example.com.
_etcd-client._tcp.example.com. 300 IN SRV 0 0 2379 infra1.example.com.
_etcd-client._tcp.example.com. 300 IN SRV 0 0 2379 infra2.example.com.
```
```
$ dig +noall +answer infra0.example.com infra1.example.com infra2.example.com
infra0.example.com. 300 IN A 10.0.1.10
infra1.example.com. 300 IN A 10.0.1.11
infra2.example.com. 300 IN A 10.0.1.12
infra0.example.com. 300 IN A 10.0.1.10
infra1.example.com. 300 IN A 10.0.1.11
infra2.example.com. 300 IN A 10.0.1.12
```
#### Bootstrap the etcd cluster using DNS
etcd cluster members can listen on domain names or IP address, the bootstrap process will resolve DNS A records.
The resolved address in `--initial-advertise-peer-urls` *must match* one of the resolved addresses in the SRV targets. The etcd member reads the resolved address to find out if it belongs to the cluster defined in the SRV records.
```
$ etcd -name infra0 \
-discovery-srv example.com \
-initial-advertise-peer-urls http://infra0.example.com:2380 \
-initial-cluster-token etcd-cluster-1 \
-initial-cluster-state new \
-advertise-client-urls http://infra0.example.com:2379 \
-listen-client-urls http://infra0.example.com:2379 \
-listen-peer-urls http://infra0.example.com:2380
$ etcd --name infra0 \
--discovery-srv example.com \
--initial-advertise-peer-urls http://infra0.example.com:2380 \
--initial-cluster-token etcd-cluster-1 \
--initial-cluster-state new \
--advertise-client-urls http://infra0.example.com:2379 \
--listen-client-urls http://infra0.example.com:2379 \
--listen-peer-urls http://infra0.example.com:2380
```
```
$ etcd -name infra1 \
-discovery-srv example.com \
-initial-advertise-peer-urls http://infra1.example.com:2380 \
-initial-cluster-token etcd-cluster-1 \
-initial-cluster-state new \
-advertise-client-urls http://infra1.example.com:2379 \
-listen-client-urls http://infra1.example.com:2379 \
-listen-peer-urls http://infra1.example.com:2380
$ etcd --name infra1 \
--discovery-srv example.com \
--initial-advertise-peer-urls http://infra1.example.com:2380 \
--initial-cluster-token etcd-cluster-1 \
--initial-cluster-state new \
--advertise-client-urls http://infra1.example.com:2379 \
--listen-client-urls http://infra1.example.com:2379 \
--listen-peer-urls http://infra1.example.com:2380
```
```
$ etcd -name infra2 \
-discovery-srv example.com \
-initial-advertise-peer-urls http://infra2.example.com:2380 \
-initial-cluster-token etcd-cluster-1 \
-initial-cluster-state new \
-advertise-client-urls http://infra2.example.com:2379 \
-listen-client-urls http://infra2.example.com:2379 \
-listen-peer-urls http://infra2.example.com:2380
$ etcd --name infra2 \
--discovery-srv example.com \
--initial-advertise-peer-urls http://infra2.example.com:2380 \
--initial-cluster-token etcd-cluster-1 \
--initial-cluster-state new \
--advertise-client-urls http://infra2.example.com:2379 \
--listen-client-urls http://infra2.example.com:2379 \
--listen-peer-urls http://infra2.example.com:2380
```
You can also bootstrap the cluster using IP addresses instead of domain names:
```
$ etcd -name infra0 \
-discovery-srv example.com \
-initial-advertise-peer-urls http://10.0.1.10:2380 \
-initial-cluster-token etcd-cluster-1 \
-initial-cluster-state new \
-advertise-client-urls http://10.0.1.10:2379 \
-listen-client-urls http://10.0.1.10:2379 \
-listen-peer-urls http://10.0.1.10:2380
$ etcd --name infra0 \
--discovery-srv example.com \
--initial-advertise-peer-urls http://10.0.1.10:2380 \
--initial-cluster-token etcd-cluster-1 \
--initial-cluster-state new \
--advertise-client-urls http://10.0.1.10:2379 \
--listen-client-urls http://10.0.1.10:2379 \
--listen-peer-urls http://10.0.1.10:2380
```
```
$ etcd -name infra1 \
-discovery-srv example.com \
-initial-advertise-peer-urls http://10.0.1.11:2380 \
-initial-cluster-token etcd-cluster-1 \
-initial-cluster-state new \
-advertise-client-urls http://10.0.1.11:2379 \
-listen-client-urls http://10.0.1.11:2379 \
-listen-peer-urls http://10.0.1.11:2380
$ etcd --name infra1 \
--discovery-srv example.com \
--initial-advertise-peer-urls http://10.0.1.11:2380 \
--initial-cluster-token etcd-cluster-1 \
--initial-cluster-state new \
--advertise-client-urls http://10.0.1.11:2379 \
--listen-client-urls http://10.0.1.11:2379 \
--listen-peer-urls http://10.0.1.11:2380
```
```
$ etcd -name infra2 \
-discovery-srv example.com \
-initial-advertise-peer-urls http://10.0.1.12:2380 \
-initial-cluster-token etcd-cluster-1 \
-initial-cluster-state new \
-advertise-client-urls http://10.0.1.12:2379 \
-listen-client-urls http://10.0.1.12:2379 \
-listen-peer-urls http://10.0.1.12:2380
$ etcd --name infra2 \
--discovery-srv example.com \
--initial-advertise-peer-urls http://10.0.1.12:2380 \
--initial-cluster-token etcd-cluster-1 \
--initial-cluster-state new \
--advertise-client-urls http://10.0.1.12:2379 \
--listen-client-urls http://10.0.1.12:2379 \
--listen-peer-urls http://10.0.1.12:2380
```
#### etcd proxy configuration
@ -369,20 +388,47 @@ $ etcd -name infra2 \
DNS SRV records can also be used to configure the list of peers for an etcd server running in proxy mode:
```
$ etcd --proxy on -discovery-srv example.com
$ etcd --proxy on --discovery-srv example.com
```
#### etcd client configuration
DNS SRV records can also be used to help clients discover the etcd cluster.
The official [etcd/client][client] supports [DNS Discovery][client-discoverer].
`etcdctl` also supports DNS Discovery by specifying the `--discovery-srv` option.
```
$ etcdctl --discovery-srv example.com set foo bar
```
#### Error Cases
You might see an error like `cannot find local etcd $name from SRV records.`. That means the etcd member fails to find itself from the cluster defined in SRV records. The resolved address in `--initial-advertise-peer-urls` *must match* one of the resolved addresses in the SRV targets.
# 0.4 to 2.0+ Migration Guide
In etcd 2.0 we introduced the ability to listen on more than one address and to advertise multiple addresses. This makes using etcd easier when you have complex networking, such as private and public networks on various cloud providers.
To make understanding this feature easier, we changed the naming of some flags, but we support the old flags to make the migration from the old to new version easier.
|Old Flag |New Flag |Migration Behavior |
|Old Flag |New Flag |Migration Behavior |
|-----------------------|-----------------------|---------------------------------------------------------------------------------------|
|-peer-addr |-initial-advertise-peer-urls |If specified, peer-addr will be used as the only peer URL. Error if both flags specified.|
|-addr |-advertise-client-urls |If specified, addr will be used as the only client URL. Error if both flags specified.|
|-peer-bind-addr |-listen-peer-urls |If specified, peer-bind-addr will be used as the only peer bind URL. Error if both flags specified.|
|-bind-addr |-listen-client-urls |If specified, bind-addr will be used as the only client bind URL. Error if both flags specified.|
|-peers |none |Deprecated. The -initial-cluster flag provides a similar concept with different semantics. Please read this guide on cluster startup.|
|-peers-file |none |Deprecated. The -initial-cluster flag provides a similar concept with different semantics. Please read this guide on cluster startup.|
|-peer-addr |--initial-advertise-peer-urls |If specified, peer-addr will be used as the only peer URL. Error if both flags specified.|
|-addr |--advertise-client-urls |If specified, addr will be used as the only client URL. Error if both flags specified.|
|-peer-bind-addr |--listen-peer-urls |If specified, peer-bind-addr will be used as the only peer bind URL. Error if both flags specified.|
|-bind-addr |--listen-client-urls |If specified, bind-addr will be used as the only client bind URL. Error if both flags specified.|
|-peers |none |Deprecated. The --initial-cluster flag provides a similar concept with different semantics. Please read this guide on cluster startup.|
|-peers-file |none |Deprecated. The --initial-cluster flag provides a similar concept with different semantics. Please read this guide on cluster startup.|
[client]: /client
[client-discoverer]: https://godoc.org/github.com/coreos/etcd/client#Discoverer
[conf-adv-client]: configuration.md#-advertise-client-urls
[conf-listen-client]: configuration.md#-listen-client-urls
[discovery-proto]: discovery_protocol.md
[fall-back]: proxy.md#fallback-to-proxy-mode-with-discovery-service
[proxy]: proxy.md
[rfc-srv]: http://www.ietf.org/rfc/rfc2052.txt
[runtime-conf]: runtime-configuration.md
[runtime-reconf-design]: runtime-reconf-design.md

View File

@ -1,207 +1,282 @@
## Configuration Flags
# Configuration Flags
etcd is configurable through command-line flags and environment variables. Options set on the command line take precedence over those from the environment.
The format of environment variable for flag `-my-flag` is `ETCD_MY_FLAG`. It applies to all flags.
The format of environment variable for flag `--my-flag` is `ETCD_MY_FLAG`. It applies to all flags.
The [official etcd ports][iana-ports] are 2379 for client requests, and 2380 for peer communication. Some legacy code and documentation still references ports 4001 and 7001, but all new etcd use and discussion should adopt the assigned ports.
To start etcd automatically using custom settings at startup in Linux, using a [systemd][systemd-intro] unit is highly recommended.
[systemd-intro]: http://freedesktop.org/wiki/Software/systemd/
### Member Flags
## Member Flags
##### -name
### --name
+ Human-readable name for this member.
+ default: "default"
+ This value is referenced as this node's own entries listed in the `-initial-cluster` flag (Ex: `default=http://localhost:2380` or `default=http://localhost:2380,default=http://localhost:7001`). This needs to match the key used in the flag if you're using [static boostrapping](clustering.md#static).
+ env variable: ETCD_NAME
+ This value is referenced as this node's own entries listed in the `--initial-cluster` flag (Ex: `default=http://localhost:2380` or `default=http://localhost:2380,default=http://localhost:7001`). This needs to match the key used in the flag if you're using [static bootstrapping][build-cluster]. When using discovery, each member must have a unique name. `Hostname` or `machine-id` can be a good choice.
##### -data-dir
### --data-dir
+ Path to the data directory.
+ default: "${name}.etcd"
+ env variable: ETCD_DATA_DIR
##### -snapshot-count
### --wal-dir
+ Path to the dedicated wal directory. If this flag is set, etcd will write the WAL files to the walDir rather than the dataDir. This allows a dedicated disk to be used, and helps avoid io competition between logging and other IO operations.
+ default: ""
+ env variable: ETCD_WAL_DIR
### --snapshot-count
+ Number of committed transactions to trigger a snapshot to disk.
+ default: "10000"
+ env variable: ETCD_SNAPSHOT_COUNT
##### -heartbeat-interval
### --heartbeat-interval
+ Time (in milliseconds) of a heartbeat interval.
+ default: "100"
+ env variable: ETCD_HEARTBEAT_INTERVAL
##### -election-timeout
+ Time (in milliseconds) for an election to timeout.
### --election-timeout
+ Time (in milliseconds) for an election to timeout. See [Documentation/tuning.md](tuning.md#time-parameters) for details.
+ default: "1000"
+ env variable: ETCD_ELECTION_TIMEOUT
##### -listen-peer-urls
+ List of URLs to listen on for peer traffic.
### --listen-peer-urls
+ List of URLs to listen on for peer traffic. This flag tells the etcd to accept incoming requests from its peers on the specified scheme://IP:port combinations. Scheme can be either http or https.If 0.0.0.0 is specified as the IP, etcd listens to the given port on all interfaces. If an IP address is given as well as a port, etcd will listen on the given port and interface. Multiple URLs may be used to specify a number of addresses and ports to listen on. The etcd will respond to requests from any of the listed addresses and ports.
+ default: "http://localhost:2380,http://localhost:7001"
+ env variable: ETCD_LISTEN_PEER_URLS
+ example: "http://10.0.0.1:2380"
+ invalid example: "http://example.com:2380" (domain name is invalid for binding)
##### -listen-client-urls
+ List of URLs to listen on for client traffic.
### --listen-client-urls
+ List of URLs to listen on for client traffic. This flag tells the etcd to accept incoming requests from the clients on the specified scheme://IP:port combinations. Scheme can be either http or https. If 0.0.0.0 is specified as the IP, etcd listens to the given port on all interfaces. If an IP address is given as well as a port, etcd will listen on the given port and interface. Multiple URLs may be used to specify a number of addresses and ports to listen on. The etcd will respond to requests from any of the listed addresses and ports.
+ default: "http://localhost:2379,http://localhost:4001"
+ env variable: ETCD_LISTEN_CLIENT_URLS
+ example: "http://10.0.0.1:2379"
+ invalid example: "http://example.com:2379" (domain name is invalid for binding)
##### -max-snapshots
### --max-snapshots
+ Maximum number of snapshot files to retain (0 is unlimited)
+ default: 5
+ env variable: ETCD_MAX_SNAPSHOTS
+ The default for users on Windows is unlimited, and manual purging down to 5 (or your preference for safety) is recommended.
##### -max-wals
### --max-wals
+ Maximum number of wal files to retain (0 is unlimited)
+ default: 5
+ env variable: ETCD_MAX_WALS
+ The default for users on Windows is unlimited, and manual purging down to 5 (or your preference for safety) is recommended.
##### -cors
### --cors
+ Comma-separated white list of origins for CORS (cross-origin resource sharing).
+ default: none
+ env variable: ETCD_CORS
### Clustering Flags
## Clustering Flags
`-initial` prefix flags are used in bootstrapping ([static bootstrap][build-cluster], [discovery-service bootstrap][discovery] or [runtime reconfiguration][reconfig]) a new member, and ignored when restarting an existing member.
`--initial` prefix flags are used in bootstrapping ([static bootstrap][build-cluster], [discovery-service bootstrap][discovery] or [runtime reconfiguration][reconfig]) a new member, and ignored when restarting an existing member.
`-discovery` prefix flags need to be set when using [discovery service][discovery].
`--discovery` prefix flags need to be set when using [discovery service][discovery].
##### -initial-advertise-peer-urls
### --initial-advertise-peer-urls
+ List of this member's peer URLs to advertise to the rest of the cluster. These addresses are used for communicating etcd data around the cluster. At least one must be routable to all cluster members.
+ List of this member's peer URLs to advertise to the rest of the cluster. These addresses are used for communicating etcd data around the cluster. At least one must be routable to all cluster members. These URLs can contain domain names.
+ default: "http://localhost:2380,http://localhost:7001"
+ env variable: ETCD_INITIAL_ADVERTISE_PEER_URLS
+ example: "http://example.com:2380, http://10.0.0.1:2380"
##### -initial-cluster
### --initial-cluster
+ Initial cluster configuration for bootstrapping.
+ default: "default=http://localhost:2380,default=http://localhost:7001"
+ The key is the value of the `-name` flag for each node provided. The default uses `default` for the key because this is the default for the `-name` flag.
+ env variable: ETCD_INITIAL_CLUSTER
+ The key is the value of the `--name` flag for each node provided. The default uses `default` for the key because this is the default for the `--name` flag.
##### -initial-cluster-state
### --initial-cluster-state
+ Initial cluster state ("new" or "existing"). Set to `new` for all members present during initial static or DNS bootstrapping. If this option is set to `existing`, etcd will attempt to join the existing cluster. If the wrong value is set, etcd will attempt to start but fail safely.
+ default: "new"
+ env variable: ETCD_INITIAL_CLUSTER_STATE
[static bootstrap]: clustering.md#static
##### -initial-cluster-token
### --initial-cluster-token
+ Initial cluster token for the etcd cluster during bootstrap.
+ default: "etcd-cluster"
+ env variable: ETCD_INITIAL_CLUSTER_TOKEN
##### -advertise-client-urls
+ List of this member's client URLs to advertise to the rest of the cluster.
### --advertise-client-urls
+ List of this member's client URLs to advertise to the rest of the cluster. These URLs can contain domain names.
+ default: "http://localhost:2379,http://localhost:4001"
+ env variable: ETCD_ADVERTISE_CLIENT_URLS
+ example: "http://example.com:2379, http://10.0.0.1:2379"
+ Be careful if you are advertising URLs such as http://localhost:2379 from a cluster member and are using the proxy feature of etcd. This will cause loops, because the proxy will be forwarding requests to itself until its resources (memory, file descriptors) are eventually depleted.
##### -discovery
### --discovery
+ Discovery URL used to bootstrap the cluster.
+ default: none
+ env variable: ETCD_DISCOVERY
##### -discovery-srv
### --discovery-srv
+ DNS srv domain used to bootstrap the cluster.
+ default: none
+ env variable: ETCD_DISCOVERY_SRV
##### -discovery-fallback
### --discovery-fallback
+ Expected behavior ("exit" or "proxy") when discovery services fails.
+ default: "proxy"
+ env variable: ETCD_DISCOVERY_FALLBACK
##### -discovery-proxy
### --discovery-proxy
+ HTTP proxy to use for traffic to discovery service.
+ default: none
+ env variable: ETCD_DISCOVERY_PROXY
### Proxy Flags
### --strict-reconfig-check
+ Reject reconfiguration requests that would cause quorum loss.
+ default: false
+ env variable: ETCD_STRICT_RECONFIG_CHECK
`-proxy` prefix flags configures etcd to run in [proxy mode][proxy].
## Proxy Flags
##### -proxy
`--proxy` prefix flags configures etcd to run in [proxy mode][proxy].
### --proxy
+ Proxy mode setting ("off", "readonly" or "on").
+ default: "off"
+ env variable: ETCD_PROXY
##### -proxy-failure-wait
### --proxy-failure-wait
+ Time (in milliseconds) an endpoint will be held in a failed state before being reconsidered for proxied requests.
+ default: 5000
+ env variable: ETCD_PROXY_FAILURE_WAIT
##### -proxy-refresh-interval
### --proxy-refresh-interval
+ Time (in milliseconds) of the endpoints refresh interval.
+ default: 30000
+ env variable: ETCD_PROXY_REFRESH_INTERVAL
##### -proxy-dial-timeout
### --proxy-dial-timeout
+ Time (in milliseconds) for a dial to timeout or 0 to disable the timeout
+ default: 1000
+ env variable: ETCD_PROXY_DIAL_TIMEOUT
##### -proxy-write-timeout
### --proxy-write-timeout
+ Time (in milliseconds) for a write to timeout or 0 to disable the timeout.
+ default: 5000
+ env variable: ETCD_PROXY_WRITE_TIMEOUT
##### -proxy-read-timeout
### --proxy-read-timeout
+ Time (in milliseconds) for a read to timeout or 0 to disable the timeout.
+ Don't change this value if you use watches because they are using long polling requests.
+ default: 0
+ env variable: ETCD_PROXY_READ_TIMEOUT
### Security Flags
## Security Flags
The security flags help to [build a secure etcd cluster][security].
##### -ca-file [DEPRECATED]
+ Path to the client server TLS CA file.
### --ca-file [DEPRECATED]
+ Path to the client server TLS CA file. `--ca-file ca.crt` could be replaced by `--trusted-ca-file ca.crt --client-cert-auth` and etcd will perform the same.
+ default: none
+ env variable: ETCD_CA_FILE
##### -cert-file
### --cert-file
+ Path to the client server TLS cert file.
+ default: none
+ env variable: ETCD_CERT_FILE
##### -key-file
### --key-file
+ Path to the client server TLS key file.
+ default: none
+ env variable: ETCD_KEY_FILE
##### -client-cert-auth
### --client-cert-auth
+ Enable client cert authentication.
+ default: false
+ env variable: ETCD_CLIENT_CERT_AUTH
##### -trusted-ca-file
### --trusted-ca-file
+ Path to the client server TLS trusted CA key file.
+ default: none
+ env variable: ETCD_TRUSTED_CA_FILE
##### -peer-ca-file [DEPRECATED]
+ Path to the peer server TLS CA file.
### --peer-ca-file [DEPRECATED]
+ Path to the peer server TLS CA file. `--peer-ca-file ca.crt` could be replaced by `--peer-trusted-ca-file ca.crt --peer-client-cert-auth` and etcd will perform the same.
+ default: none
+ env variable: ETCD_PEER_CA_FILE
##### -peer-cert-file
### --peer-cert-file
+ Path to the peer server TLS cert file.
+ default: none
+ env variable: ETCD_PEER_CERT_FILE
##### -peer-key-file
### --peer-key-file
+ Path to the peer server TLS key file.
+ default: none
+ env variable: ETCD_PEER_KEY_FILE
##### -peer-client-cert-auth
### --peer-client-cert-auth
+ Enable peer client cert authentication.
+ default: false
+ env variable: ETCD_PEER_CLIENT_CERT_AUTH
##### -peer-trusted-ca-file
### --peer-trusted-ca-file
+ Path to the peer server TLS trusted CA file.
+ default: none
+ env variable: ETCD_PEER_TRUSTED_CA_FILE
### Logging Flags
## Logging Flags
##### -debug
### --debug
+ Drop the default log level to DEBUG for all subpackages.
+ default: false (INFO for all packages)
+ env variable: ETCD_DEBUG
##### -log-package-levels
### --log-package-levels
+ Set individual etcd subpackages to specific log levels. An example being `etcdserver=WARNING,security=DEBUG`
+ default: none (INFO for all packages)
+ env variable: ETCD_LOG_PACKAGE_LEVELS
### Unsafe Flags
## Unsafe Flags
Please be CAUTIOUS when using unsafe flags because it will break the guarantees given by the consensus protocol.
For example, it may panic if other members in the cluster are still alive.
Follow the instructions when using these flags.
##### -force-new-cluster
+ Force to create a new one-member cluster. It commits configuration changes in force to remove all existing members in the cluster and add itself. It needs to be set to [restore a backup][restore].
### --force-new-cluster
+ Force to create a new one-member cluster. It commits configuration changes forcing to remove all existing members in the cluster and add itself. It needs to be set to [restore a backup][restore].
+ default: false
+ env variable: ETCD_FORCE_NEW_CLUSTER
## Experimental Flags
### --experimental-v3demo
+ Enable experimental [v3 demo API][rfc-v3].
+ default: false
+ env variable: ETCD_EXPERIMENTAL_V3DEMO
## Miscellaneous Flags
### --version
+ Print the version and exit.
+ default: false
### Miscellaneous Flags
## Profiling flags
##### -version
+ Print the version and exit.
### --enable-pprof
+ Enable runtime profiling data via HTTP server. Address is at client URL + "/debug/pprof"
+ default: false
[build-cluster]: clustering.md#static
[reconfig]: runtime-configuration.md
[discovery]: clustering.md#discovery
[iana-ports]: https://www.iana.org/assignments/service-names-port-numbers/service-names-port-numbers.xhtml?search=etcd
[proxy]: proxy.md
[security]: security.md
[reconfig]: runtime-configuration.md
[restore]: admin_guide.md#restoring-a-backup
[rfc-v3]: rfc/v3api.md
[security]: security.md
[systemd-intro]: http://freedesktop.org/wiki/Software/systemd/
[tuning]: tuning.md#time-parameters

View File

@ -0,0 +1,106 @@
# etcd release guide
The guide talks about how to release a new version of etcd.
The procedure includes some manual steps for sanity checking but it can probably be further scripted. Please keep this document up-to-date if you want to make changes to the release process.
## Prepare Release
Set desired version as environment variable for following steps. Here is an example to release 2.3.0:
```
export VERSION=v2.3.0
export PREV_VERSION=v2.2.5
```
All releases version numbers follow the format of [semantic versioning 2.0.0](http://semver.org/).
### Major, Minor Version Release, or its Pre-release
- Ensure the relevant milestone on GitHub is complete. All referenced issues should be closed, or moved elsewhere.
- Remove this release from [roadmap](https://github.com/coreos/etcd/blob/master/ROADMAP.md), if necessary.
- Ensure the latest upgrade documentation is available.
- Bump [hardcoded MinClusterVerion in the repository](https://github.com/coreos/etcd/blob/master/version/version.go#L29), if necessary.
- Add feature capability maps for the new version, if necessary.
### Patch Version Release
- Discuss about commits that are backported to the patch release. The commits should not include merge commits.
- Cherry-pick these commits starting from the oldest one into stable branch.
## Write Release Note
- Write introduction for the new release. For example, what major bug we fix, what new features we introduce or what performance improvement we make.
- Write changelog for the last release. ChangeLog should be straightforward and easy to understand for the end-user.
- Put `[GH XXXX]` at the head of change line to reference Pull Request that introduces the change. Moreover, add a link on it to jump to the Pull Request.
## Tag Version
- Bump [hardcoded Version in the repository](https://github.com/coreos/etcd/blob/master/version/version.go#L30) to the latest version `${VERSION}`.
- Ensure all tests on CI system are passed.
- Manually check etcd is buildable in Linux, Darwin and Windows.
- Manually check upgrade etcd cluster of previous minor version works well.
- Manually check new features work well.
- Add a signed tag through `git tag -s ${VERSION}`.
- Sanity check tag correctness through `git show tags/$VERSION`.
- Push the tag to GitHub through `git push origin tags/$VERSION`. This assumes `origin` corresponds to "https://github.com/coreos/etcd".
## Build Release Binaries and Images
- Ensure `actool` is available, or installing it through `go get github.com/appc/spec/actool`.
- Ensure `docker` is available.
Run release script in root directory:
```
./scripts/release.sh ${VERSION}
```
It generates all release binaries and images under directory ./release.
## Sign Binaries and Images
etcd project key must be used to sign the generated binaries and images.`$SUBKEYID` is the key ID of etcd project Yubikey. Connect the key and run `gpg2 --card-status` to get the ID.
The following commands are used for public release sign:
```
cd release
for i in etcd-*{.zip,.tar.gz}; do gpg2 --default-key $SUBKEYID --output ${i}.asc --detach-sign ${i}; done
for i in etcd-*{.zip,.tar.gz}; do gpg2 --verify ${i}.asc ${i}; done
```
## Publish Release Page in GitHub
- Set release title as the version name.
- Follow the format of previous release pages.
- Attach the generated binaries, aci image and signatures.
- Select whether it is a pre-release.
- Publish the release!
## Publish Docker Image in Quay.io
- Push docker image:
```
docker login quay.io
docker push quay.io/coreos/etcd:${VERSION}
```
- Add `latest` tag to the new image on [quay.io](https://quay.io/repository/coreos/etcd?tag=latest&tab=tags) if this is a stable release.
## Announce to etcd-dev Googlegroup
- Follow the format of [previous release emails](https://groups.google.com/forum/#!forum/etcd-dev).
- Make sure to include a list of authors that contributed since the previous release - something like the following might be handy:
```
git log ...${PREV_VERSION} --pretty=format:"%an" | sort | uniq | tr '\n' ',' | sed -e 's#,#, #g' -e 's#, $##'
```
- Send email to etcd-dev@googlegroups.com
## Post Release
- Create new stable branch through `git push origin ${VERSION_MAJOR}.${VERSION_MINOR}` if this is a major stable release. This assumes `origin` corresponds to "https://github.com/coreos/etcd".
- Bump [hardcoded Version in the repository](https://github.com/coreos/etcd/blob/master/version/version.go#L30) to the version `${VERSION}+git`.

View File

@ -0,0 +1,114 @@
# Discovery Service Protocol
Discovery service protocol helps new etcd member to discover all other members in cluster bootstrap phase using a shared discovery URL.
Discovery service protocol is _only_ used in cluster bootstrap phase, and cannot be used for runtime reconfiguration or cluster monitoring.
The protocol uses a new discovery token to bootstrap one _unique_ etcd cluster. Remember that one discovery token can represent only one etcd cluster. As long as discovery protocol on this token starts, even if it fails halfway, it must not be used to bootstrap another etcd cluster.
The rest of this article will walk through the discovery process with examples that correspond to a self-hosted discovery cluster. The public discovery service, discovery.etcd.io, functions the same way, but with a layer of polish to abstract away ugly URLs, generate UUIDs automatically, and provide some protections against excessive requests. At its core, the public discovery service still uses an etcd cluster as the data store as described in this document.
## The Protocol Workflow
The idea of discovery protocol is to use an internal etcd cluster to coordinate bootstrap of a new cluster. First, all new members interact with discovery service and help to generate the expected member list. Then each new member bootstraps its server using this list, which performs the same functionality as -initial-cluster flag.
In the following example workflow, we will list each step of protocol in curl format for ease of understanding.
By convention the etcd discovery protocol uses the key prefix `_etcd/registry`. If `http://example.com` hosts an etcd cluster for discovery service, a full URL to discovery keyspace will be `http://example.com/v2/keys/_etcd/registry`. We will use this as the URL prefix in the example.
### Creating a New Discovery Token
Generate a unique token that will identify the new cluster. This will be used as a unique prefix in discovery keyspace in the following steps. An easy way to do this is to use `uuidgen`:
```
UUID=$(uuidgen)
```
### Specifying the Expected Cluster Size
You need to specify the expected cluster size for this discovery token. The size is used by the discovery service to know when it has found all members that will initially form the cluster.
```
curl -X PUT http://example.com/v2/keys/_etcd/registry/${UUID}/_config/size -d value=${cluster_size}
```
Usually the cluster size is 3, 5 or 7. Check [optimal cluster size][cluster-size] for more details.
### Bringing up etcd Processes
Now that you have your discovery URL, you can use it as `-discovery` flag and bring up etcd processes. Every etcd process will follow this next few steps internally if given a `-discovery` flag.
### Registering itself
The first thing for etcd process is to register itself into the discovery URL as a member. This is done by creating member ID as a key in the discovery URL.
```
curl -X PUT http://example.com/v2/keys/_etcd/registry/${UUID}/${member_id}?prevExist=false -d value="${member_name}=${member_peer_url_1}&${member_name}=${member_peer_url_2}"
```
### Checking the Status
It checks the expected cluster size and registration status in discovery URL, and decides what the next action is.
```
curl -X GET http://example.com/v2/keys/_etcd/registry/${UUID}/_config/size
curl -X GET http://example.com/v2/keys/_etcd/registry/${UUID}
```
If registered members are still not enough, it will wait for left members to appear.
If the number of registered members is bigger than the expected size N, it treats the first N registered members as the member list for the cluster. If the member itself is in the member list, the discovery procedure succeeds and it fetches all peers through the member list. If it is not in the member list, the discovery procedure finishes with the failure that the cluster has been full.
In etcd implementation, the member may check the cluster status even before registering itself. So it could fail quickly if the cluster has been full.
### Waiting for All Members
The wait process is described in detail in the [etcd API documentation][api].
```
curl -X GET http://example.com/v2/keys/_etcd/registry/${UUID}?wait=true&waitIndex=${current_etcd_index}
```
It keeps waiting until finding all members.
## Public Discovery Service
CoreOS Inc. hosts a public discovery service at https://discovery.etcd.io/ , which provides some nice features for ease of use.
### Mask Key Prefix
Public discovery service will redirect `https://discovery.etcd.io/${UUID}` to etcd cluster behind for the key at `/v2/keys/_etcd/registry`. It masks register key prefix for short and readable discovery url.
### Get new token
```
GET /new
Sent query:
size=${cluster_size}
Possible status codes:
200 OK
400 Bad Request
200 Body:
generated discovery url
```
The generation process in the service follows the steps from [Creating a New Discovery Token][new-discovery-token] to [Specifying the Expected Cluster Size][expected-cluster-size].
### Check Discovery Status
```
GET /${UUID}
```
You can check the status for this discovery token, including the machines that have been registered, by requesting the value of the UUID.
### Open-source repository
The repository is located at https://github.com/coreos/discovery.etcd.io. You could use it to build your own public discovery service.
[api]: api.md#waiting-for-a-change
[cluster-size]: admin_guide.md#optimal-cluster-size
[expected-cluster-size]: #specifying-the-expected-cluster-size
[new-discovery-token]: #creating-a-new-discovery-token

View File

@ -12,9 +12,11 @@ export HostIP="192.168.12.50"
The following `docker run` command will expose the etcd client API over ports 4001 and 2379, and expose the peer port over 2380.
This will run the latest release version of etcd. You can specify version if needed (e.g. `quay.io/coreos/etcd:v2.2.0`).
```
docker run -d -v /usr/share/ca-certificates/:/etc/ssl/certs -p 4001:4001 -p 2380:2380 -p 2379:2379 \
--name etcd quay.io/coreos/etcd:v2.0.8 \
--name etcd quay.io/coreos/etcd \
-name etcd0 \
-advertise-client-urls http://${HostIP}:2379,http://${HostIP}:4001 \
-listen-client-urls http://0.0.0.0:2379,http://0.0.0.0:4001 \
@ -44,7 +46,7 @@ The main difference being the value used for the `-initial-cluster` flag, which
```
docker run -d -v /usr/share/ca-certificates/:/etc/ssl/certs -p 4001:4001 -p 2380:2380 -p 2379:2379 \
--name etcd quay.io/coreos/etcd:v2.0.8 \
--name etcd quay.io/coreos/etcd \
-name etcd0 \
-advertise-client-urls http://192.168.12.50:2379,http://192.168.12.50:4001 \
-listen-client-urls http://0.0.0.0:2379,http://0.0.0.0:4001 \
@ -59,7 +61,7 @@ docker run -d -v /usr/share/ca-certificates/:/etc/ssl/certs -p 4001:4001 -p 2380
```
docker run -d -v /usr/share/ca-certificates/:/etc/ssl/certs -p 4001:4001 -p 2380:2380 -p 2379:2379 \
--name etcd quay.io/coreos/etcd:v2.0.8 \
--name etcd quay.io/coreos/etcd \
-name etcd1 \
-advertise-client-urls http://192.168.12.51:2379,http://192.168.12.51:4001 \
-listen-client-urls http://0.0.0.0:2379,http://0.0.0.0:4001 \
@ -74,7 +76,7 @@ docker run -d -v /usr/share/ca-certificates/:/etc/ssl/certs -p 4001:4001 -p 2380
```
docker run -d -v /usr/share/ca-certificates/:/etc/ssl/certs -p 4001:4001 -p 2380:2380 -p 2379:2379 \
--name etcd quay.io/coreos/etcd:v2.0.8 \
--name etcd quay.io/coreos/etcd \
-name etcd2 \
-advertise-client-urls http://192.168.12.52:2379,http://192.168.12.52:4001 \
-listen-client-urls http://0.0.0.0:2379,http://0.0.0.0:4001 \

View File

@ -1,4 +1,4 @@
Error Code
# Error Code
======
This document describes the error code used in key space '/v2/keys'. Feel free to import 'github.com/coreos/etcd/error' to use.

83
Documentation/faq.md Normal file
View File

@ -0,0 +1,83 @@
# FAQ
## 1) How come I can read an old version of the data when a majority of the members are down?
In situations where a client connects to a minority, etcd
favors by default availability over consistency. This means that even though
data might be “out of date”, it is still better to return something versus
nothing.
In order to confirm that a read is up to date with a majority of the cluster,
the client can use the `quorum=true` parameter on reads of keys. This means
that a majority of the cluster is checked on reads before returning the data,
otherwise the read will timeout and fail.
## 2) With quorum=false, doesnt this mean that if my client switched the member it was connected to, that it could experience a logical ordering where the cluster goes backwards in time?
Yes, but this could be handled at the etcd client implementation via
remembering the last seen index. The “index” is the cluster's single
irrevocable sequence of the entire modification history. The client could
remember the last seen index, and determine via comparing the index returned on
the GET whether or not the state of the key-value pair is before or after its
last seen state.
## 3) What happens if a watch is registered on a minority member?
The watch will stay untriggered, even as modifications are occurring in the
majority quorum. This is an open issue, and is being addressed in v3. There are
multiple ways to work around the watch trigger not firing.
1) build a signaling mechanism independent of etcd. This could be as simple as
a “pulse” to the client to reissue a GET with quorum=true for the most recent
version of the data.
2) poll on the `/v2/keys` endpoint and check that the raft-index is increasing every
timeout.
## 4) What is a proxy used for?
A proxy is a redirection server to the etcd cluster. The proxy handles the
redirection of a client to the current configuration of the etcd cluster. A
typical use case is to start a proxy on a machine, and on first boot up of the
proxy specify both the `--proxy` flag and the `--initial-cluster` flag.
From there, any etcdctl client that starts up automatically speaks to the local
proxy and the proxy redirects operations to the current configuration of the
cluster it was originally paired with.
In the v2 spec of etcd, proxies cannot be promoted to members of the cluster.
They also cannot be promoted to followers or at any point become part of the
replication of the etcd cluster itself.
## 5) How is cluster membership and health handled in etcd v2?
The design goal of etcd is that reconfiguration is simply an API, and health
monitoring and addition/removal of members is up to the individual application
and their integration with the reconfiguration API.
Thus, a member that is down, even infinitely, will never be automatically
removed from the etcd cluster member list.
This makes sense because it's usually an application level / administrative
action to determine whether a reconfiguration should happen based on health.
For more information, refer to the [runtime reconfiguration design document][runtime-reconf-design].
## 6) how does --endpoint work with etcdctl?
The `--endpoint` flag can specify any number of etcd cluster members in a comma
separated list. This list might be a subset, equal to, or more than the actual
etcd cluster member list itself.
If only one peer is specified via the `--endpoint` flag, the etcdctl discovers the
rest of the cluster via the member list of that one peer, and then it randomly
chooses a member to use. Again, the client can use the `quorum=true` flag on
reads, which will always fail when using a member in the minority.
If peers from multiple clusters are specified via the `--endpoint` flag, etcdctl
will randomly choose a peer, and the request will simply get routed to one of
the clusters. This is probably not what you want.
Note: --peers flag is now deprecated and --endpoint should be used instead,
as it might confuse users to give etcdctl a peerURL.
[runtime-reconf-design]: runtime-reconf-design.md

View File

@ -1,35 +1,35 @@
## Glossary
# Glossary
This document defines the various terms used in etcd documentation, command line and source code.
### Node
## Node
Node is an instance of raft state machine.
It has a unique identification, and records other nodes' progress internally when it is the leader.
### Member
## Member
Member is an instance of etcd. It hosts a node, and provides service to clients.
### Cluster
## Cluster
Cluster consists of several members.
The node in each member follows raft consensus protocol to replicate logs. Cluster receives proposals from members, commits them and apply to local store.
### Peer
## Peer
Peer is another member of the same cluster.
### Proposal
## Proposal
A proposal is a request (for example a write request, a configuration change request) that needs to go through raft protocol.
### Client
## Client
Client is a caller of the cluster's HTTP API.
### Machine (deprecated)
## Machine (deprecated)
The alternative of Member in etcd before 2.0

View File

@ -46,7 +46,7 @@ ExecStart=/usr/bin/etcd
There are several error cases:
0) Init has already ran and the data directory is already configured
0) Init has already run and the data directory is already configured
1) Discovery fails because of network timeout, etc
2) Discovery fails because the cluster is already full and etcd needs to fall back to proxy
3) Static cluster configuration fails because of conflict, misconfiguration or timeout

View File

@ -1,19 +1,24 @@
## Libraries and Tools
# Libraries and Tools
**Tools**
- [etcdctl](https://github.com/coreos/etcdctl) - A command line client for etcd
- [etcdctl](https://github.com/coreos/etcd/tree/master/etcdctl) - A command line client for etcd
- [etcd-backup](https://github.com/fanhattan/etcd-backup) - A powerful command line utility for dumping/restoring etcd - Supports v2
- [etcd-dump](https://npmjs.org/package/etcd-dump) - Command line utility for dumping/restoring etcd.
- [etcd-fs](https://github.com/xetorthio/etcd-fs) - FUSE filesystem for etcd
- [etcddir](https://github.com/rekby/etcddir) - Realtime sync etcd and local directory. Work with windows and linux.
- [etcd-browser](https://github.com/henszey/etcd-browser) - A web-based key/value editor for etcd using AngularJS
- [etcd-lock](https://github.com/datawisesystems/etcd-lock) - Master election & distributed r/w lock implementation using etcd - Supports v2
- [etcd-console](https://github.com/matishsiao/etcd-console) - A web-base key/value editor for etcd using PHP
- [etcd-viewer](https://github.com/nikfoundas/etcd-viewer) - An etcd key-value store editor/viewer written in Java
- [etcdtool](https://github.com/mickep76/etcdtool) - Export/Import/Edit etcd directory as JSON/YAML/TOML and Validate directory using JSON schema
- [etcd-rest](https://github.com/mickep76/etcd-rest) - Create generic REST API in Go using etcd as a backend with validation using JSON schema
- [etcdsh](https://github.com/kamilhark/etcdsh) - A command line client with support of command history and tab completion. Supports v2
**Go libraries**
- [go-etcd](https://github.com/coreos/go-etcd) - Supports v2
- [etcd/client](https://github.com/coreos/etcd/blob/master/client) - the officially maintained Go client
- [go-etcd](https://github.com/coreos/go-etcd) - the deprecated official client. May be useful for older (<2.0.0) versions of etcd.
**Java libraries**
@ -45,9 +50,11 @@
**C libraries**
- [jdarcy/etcd-api](https://github.com/jdarcy/etcd-api) - Supports v2
- [shafreeck/cetcd](https://github.com/shafreeck/cetcd) - Supports v2
**C++ libraries**
- [edwardcapriolo/etcdcpp](https://github.com/edwardcapriolo/etcdcpp) - Supports v2
- [suryanathan/etcdcpp](https://github.com/suryanathan/etcdcpp) - Supports v2 (with waits)
**Clojure libraries**
@ -61,6 +68,7 @@
**.Net Libraries**
- [wangjia184/etcdnet](https://github.com/wangjia184/etcdnet) - Supports v2
- [drusellers/etcetera](https://github.com/drusellers/etcetera)
**PHP Libraries**
@ -79,10 +87,6 @@
- [efrecon/etcd-tcl](https://github.com/efrecon/etcd-tcl) - Supports v2, except wait.
A detailed recap of client functionalities can be found in the [clients compatibility matrix][clients-matrix.md].
[clients-matrix.md]: https://github.com/coreos/etcd/blob/master/Documentation/clients-matrix.md
**Chef Integration**
- [coderanger/etcd-chef](https://github.com/coderanger/etcd-chef)
@ -110,7 +114,7 @@ A detailed recap of client functionalities can be found in the [clients compatib
- [configdb](https://git.autistici.org/ai/configdb/tree/master) - A REST relational abstraction on top of arbitrary database backends, aimed at storing configs and inventories.
- [scrz](https://github.com/scrz/scrz) - Container manager, stores configuration in etcd.
- [fleet](https://github.com/coreos/fleet) - Distributed init system
- [GoogleCloudPlatform/kubernetes](https://github.com/GoogleCloudPlatform/kubernetes) - Container cluster manager.
- [kubernetes/kubernetes](https://github.com/kubernetes/kubernetes) - Container cluster manager introduced by Google.
- [mailgun/vulcand](https://github.com/mailgun/vulcand) - HTTP proxy that uses etcd as a configuration backend.
- [duedil-ltd/discodns](https://github.com/duedil-ltd/discodns) - Simple DNS nameserver using etcd as a database for names and records.
- [skynetservices/skydns](https://github.com/skynetservices/skydns) - RFC compliant DNS server

View File

@ -0,0 +1,120 @@
# Members API
* [List members](#list-members)
* [Add a member](#add-a-member)
* [Delete a member](#delete-a-member)
* [Change the peer urls of a member](#change-the-peer-urls-of-a-member)
## List members
Return an HTTP 200 OK response code and a representation of all members in the etcd cluster.
### Request
```
GET /v2/members HTTP/1.1
```
### Example
```sh
curl http://10.0.0.10:2379/v2/members
```
```json
{
"members": [
{
"id": "272e204152",
"name": "infra1",
"peerURLs": [
"http://10.0.0.10:2380"
],
"clientURLs": [
"http://10.0.0.10:2379"
]
},
{
"id": "2225373f43",
"name": "infra2",
"peerURLs": [
"http://10.0.0.11:2380"
],
"clientURLs": [
"http://10.0.0.11:2379"
]
},
]
}
```
## Add a member
Returns an HTTP 201 response code and the representation of added member with a newly generated a memberID when successful. Returns a string describing the failure condition when unsuccessful.
If the POST body is malformed an HTTP 400 will be returned. If the member exists in the cluster or existed in the cluster at some point in the past an HTTP 409 will be returned. If any of the given peerURLs exists in the cluster an HTTP 409 will be returned. If the cluster fails to process the request within timeout an HTTP 500 will be returned, though the request may be processed later.
### Request
```
POST /v2/members HTTP/1.1
{"peerURLs": ["http://10.0.0.10:2380"]}
```
### Example
```sh
curl http://10.0.0.10:2379/v2/members -XPOST \
-H "Content-Type: application/json" -d '{"peerURLs":["http://10.0.0.10:2380"]}'
```
```json
{
"id": "3777296169",
"peerURLs": [
"http://10.0.0.10:2380"
]
}
```
## Delete a member
Remove a member from the cluster. The member ID must be a hex-encoded uint64.
Returns 204 with empty content when successful. Returns a string describing the failure condition when unsuccessful.
If the member does not exist in the cluster an HTTP 500(TODO: fix this) will be returned. If the cluster fails to process the request within timeout an HTTP 500 will be returned, though the request may be processed later.
### Request
```
DELETE /v2/members/<id> HTTP/1.1
```
### Example
```sh
curl http://10.0.0.10:2379/v2/members/272e204152 -XDELETE
```
## Change the peer urls of a member
Change the peer urls of a given member. The member ID must be a hex-encoded uint64. Returns 204 with empty content when successful. Returns a string describing the failure condition when unsuccessful.
If the POST body is malformed an HTTP 400 will be returned. If the member does not exist in the cluster an HTTP 404 will be returned. If any of the given peerURLs exists in the cluster an HTTP 409 will be returned. If the cluster fails to process the request within timeout an HTTP 500 will be returned, though the request may be processed later.
### Request
```
PUT /v2/members/<id> HTTP/1.1
{"peerURLs": ["http://10.0.0.10:2380"]}
```
### Example
```sh
curl http://10.0.0.10:2379/v2/members/272e204152 -XPUT \
-H "Content-Type: application/json" -d '{"peerURLs":["http://10.0.0.10:2380"]}'
```

View File

@ -1,101 +1,94 @@
## Metrics
# Metrics
**NOTE: The metrics feature is considered as an experimental. We might add/change/remove metrics without warning in the future releases.**
**NOTE: The metrics feature is considered experimental. We may add/change/remove metrics without warning in future releases.**
etcd uses [Prometheus](http://prometheus.io/) for metrics reporting in the server. The metrics can be used for real-time monitoring and debugging.
etcd uses [Prometheus][prometheus] for metrics reporting in the server. The metrics can be used for real-time monitoring and debugging.
etcd only stores these data in memory. If a member restarts, metrics will reset.
The simplest way to see the available metrics is to cURL the metrics endpoint `/metrics` of etcd. The format is described [here](http://prometheus.io/docs/instrumenting/exposition_formats/).
Follow the [Prometheus getting started doc][prometheus-getting-started] to spin up a Prometheus server to collect etcd metrics.
You can also follow the doc [here](http://prometheus.io/docs/introduction/getting_started/) to start a Promethus server and monitor etcd metrics.
The naming of metrics follows the suggested [best practice of Promethus](http://prometheus.io/docs/practices/naming/). A metric name has an `etcd` prefix as its namespace and a subsystem prefix (for example `wal` and `etcdserver`).
The naming of metrics follows the suggested [best practice of Prometheus][prometheus-naming]. A metric name has an `etcd` prefix as its namespace and a subsystem prefix (for example `wal` and `etcdserver`).
etcd now exposes the following metrics:
### etcdserver
## etcdserver
| Name | Description | Type |
|-----------------------------------------|--------------------------------------------------|---------|
| file_descriptors_used_total | The total number of file descriptors used | Gauge |
| proposal_durations_milliseconds | The latency distributions of committing proposal | Summary |
| pending_proposal_total | The total number of pending proposals | Gauge |
| proposal_failed_total | The total number of failed proposals | Counter |
| Name | Description | Type |
|-----------------------------------------|--------------------------------------------------|-----------|
| file_descriptors_used_total | The total number of file descriptors used | Gauge |
| proposal_durations_seconds | The latency distributions of committing proposal | Histogram |
| pending_proposal_total | The total number of pending proposals | Gauge |
| proposal_failed_total | The total number of failed proposals | Counter |
High file descriptors (`file_descriptors_used_total`) usage (near the file descriptors limitation of the process) indicates a potential out of file descriptors issue. That might cause etcd fails to create new WAL files and panics.
[Proposal](glossary.md#proposal) durations (`proposal_durations_milliseconds`) give you an summary about the proposal commit latency. Latency can be introduced into this process by network and disk IO.
[Proposal][glossary-proposal] durations (`proposal_durations_seconds`) provides a histogram about the proposal commit latency. Latency can be introduced into this process by network and disk IO.
Pending proposal (`pending_proposal_total`) gives you an idea about how many proposal are in the queue and waiting for commit. An increasing pending number indicates a high client load or an unstable cluster.
Failed proposals (`proposal_failed_total`) are normally related to two issues: temporary failures related to a leader election or longer duration downtime caused by a loss of quorum in the cluster.
## wal
### store
| Name | Description | Type |
|------------------------------------|--------------------------------------------------|-----------|
| fsync_durations_seconds | The latency distributions of fsync called by wal | Histogram |
| last_index_saved | The index of the last entry saved by wal | Gauge |
These metrics describe the accesses into the data store of etcd members that exist in the cluster. They
are useful to count what kind of actions are taken by users. It is also useful to see and whether all etcd members
"see" the same set of data mutations, and whether reads and watches (which are local) are equally distributed.
Abnormally high fsync duration (`fsync_durations_seconds`) indicates disk issues and might cause the cluster to be unstable.
All these metrics are prefixed with `etcd_store_`.
| Name | Description | Type |
|---------------------------|------------------------------------------------------------------------------------------|--------------------|
| reads_total | Total number of reads from store, should differ among etcd members (local reads). | Counter(action) |
| writes_total | Total number of writes to store, should be same among all etcd members. | Counter(action) |
| reads_failed_total | Number of failed reads from store (e.g. key missing) on local reads. | Counter(action) |
| writes_failed_total | Number of failed writes to store (e.g. failed compare and swap). | Counter(action) |
| expires_total | Total number of expired keys (due to TTL).   | Counter |
| watch_requests_totals | Total number of incoming watch requests to this etcd member (local watches). | Counter |
| watchers | Current count of active watchers on this etcd member. | Gauge |
## http requests
These metrics describe the serving of requests (non-watch events) served by etcd members in non-proxy mode: total
incoming requests, request failures and processing latency (inc. raft rounds for storage). They are useful for tracking
user-generated traffic hitting the etcd cluster .
All these metrics are prefixed with `etcd_http_`
| Name | Description | Type |
|--------------------------------|-----------------------------------------------------------------------------------------|--------------------|
| received_total | Total number of events after parsing and auth. | Counter(method) |
| failed_total | Total number of failed events.   | Counter(method,error) |
| successful_duration_second | Bucketed handling times of the requests, including raft rounds for writes. | Histogram(method) |
Both `reads_total` and `writes_total` count both successful and failed requests. `reads_failed_total` and
`writes_failed_total` count failed requests. A lot of failed writes indicate possible contentions on keys (e.g. when
doing `compareAndSet`), and read failures indicate that some clients try to access keys that don't exist.
Example Prometheus queries that may be useful from these metrics (across all etcd members):
* `sum(rate(etcd_store_reads_total{job="etcd"}[1m])) by (action)`
`max(rate(etcd_store_writes_total{job="etcd"}[1m])) by (action)`
* `sum(rate(etcd_http_failed_total{job="etcd"}[1m]) by (method) / sum(rate(etcd_http_events_received_total{job="etcd"})[1m]) by (method)`
Rate of reads and writes by action, across all servers across a time window of `1m`. The reason why `max` is used
for writes as opposed to `sum` for reads is because all of etcd nodes in the cluster apply all writes to their stores.
Shows the rate of successfull readonly/write queries across all servers, across a time window of `1m`.
* `sum(rate(etcd_store_watch_requests_total{job="etcd"}[1m]))`
Shows the fraction of events that failed by HTTP method across all members, across a time window of `1m`.
* `sum(rate(etcd_http_received_total{job="etcd",method="GET})[1m]) by (method)`
`sum(rate(etcd_http_received_total{job="etcd",method~="GET})[1m]) by (method)`
Shows rate of new watch requests per second. Likely driven by how often watched keys change.
* `sum(etcd_store_watchers{job="etcd"})`
Shows the rate of successful readonly/write queries across all servers, across a time window of `1m`.
Number of active watchers across all etcd servers.
* `histogram_quantile(0.9, sum(increase(etcd_http_successful_processing_seconds{job="etcd",method="GET"}[5m]) ) by (le))`
`histogram_quantile(0.9, sum(increase(etcd_http_successful_processing_seconds{job="etcd",method!="GET"}[5m]) ) by (le))`
Show the 0.90-tile latency (in seconds) of read/write (respectively) event handling across all members, with a window of `5m`.
## snapshot
| Name | Description | Type |
|--------------------------------------------|------------------------------------------------------------|-----------|
| snapshot_save_total_durations_seconds | The total latency distributions of save called by snapshot | Histogram |
Abnormally high snapshot duration (`snapshot_save_total_durations_seconds`) indicates disk issues and might cause the cluster to be unstable.
### wal
## rafthttp
| Name | Description | Type |
|------------------------------------|--------------------------------------------------|---------|
| fsync_durations_microseconds | The latency distributions of fsync called by wal | Summary |
| last_index_saved | The index of the last entry saved by wal | Gauge |
Abnormally high fsync duration (`fsync_durations_microseconds`) indicates disk issues and might cause the cluster to be unstable.
### snapshot
| Name | Description | Type |
|--------------------------------------------|------------------------------------------------------------|---------|
| snapshot_save_total_durations_microseconds | The total latency distributions of save called by snapshot | Summary |
Abnormally high snapshot duration (`snapshot_save_total_durations_microseconds`) indicates disk issues and might cause the cluster to be unstable.
| Name | Description | Type | Labels |
|-----------------------------------|--------------------------------------------|--------------|--------------------------------|
| message_sent_latency_seconds | The latency distributions of messages sent | HistogramVec | sendingType, msgType, remoteID |
| message_sent_failed_total | The total number of failed messages sent | Summary | sendingType, msgType, remoteID |
### rafthttp
| Name | Description | Type | Labels |
|-----------------------------------|--------------------------------------------|---------|--------------------------------|
| message_sent_latency_microseconds | The latency distributions of messages sent | Summary | sendingType, msgType, remoteID |
| message_sent_failed_total | The total number of failed messages sent | Summary | sendingType, msgType, remoteID |
Abnormally high message duration (`message_sent_latency_microseconds`) indicates network issues and might cause the cluster to be unstable.
Abnormally high message duration (`message_sent_latency_seconds`) indicates network issues and might cause the cluster to be unstable.
An increase in message failures (`message_sent_failed_total`) indicates more severe network issues and might cause the cluster to be unstable.
@ -106,7 +99,7 @@ Label `msgType` is the type of raft message. `MsgApp` is log replication message
Label `remoteID` is the member ID of the message destination.
### proxy
## proxy
etcd members operating in proxy mode do not do store operations. They forward all requests
to cluster instances.
@ -130,8 +123,12 @@ Example Prometheus queries that may be useful from these metrics (across all etc
* `histogram_quantile(0.9, sum(increase(etcd_proxy_events_handling_time_seconds_bucket{job="etcd",method="GET"}[5m])) by (le))`
`histogram_quantile(0.9, sum(increase(etcd_proxy_events_handling_time_seconds_bucket{job="etcd",method!="GET"}[5m])) by (le))`
Show the 0.90-tile latency (in seconds) of handling of user requestsacross all proxy machines, with a window of `5m`.
Show the 0.90-tile latency (in seconds) of handling of user requests across all proxy machines, with a window of `5m`.
* `sum(rate(etcd_proxy_dropped_total{job="etcd"}[1m])) by (proxying_error)`
Number of failed request on the proxy. This should be 0, spikes here indicate connectivity issues to etcd cluster.
[glossary-proposal]: glossary.md#proposal
[prometheus]: http://prometheus.io/
[prometheus-getting-started](http://prometheus.io/docs/introduction/getting_started/)
[prometheus-naming]: http://prometheus.io/docs/practices/naming/

View File

@ -1,119 +1,28 @@
## Members API
# Miscellaneous APIs
* [List members](#list-members)
* [Add a member](#add-a-member)
* [Delete a member](#delete-a-member)
* [Change the peer urls of a member](#change-the-peer-urls-of-a-member)
* [Getting the etcd version](#getting-the-etcd-version)
* [Checking health of an etcd member node](#checking-health-of-an-etcd-member-node)
## List members
## Getting the etcd version
Return an HTTP 200 OK response code and a representation of all members in the etcd cluster.
### Request
```
GET /v2/members HTTP/1.1
```
### Example
The etcd version of a specific instance can be obtained from the `/version` endpoint.
```sh
curl http://10.0.0.10:2379/v2/members
curl -L http://127.0.0.1:2379/version
```
```
etcd 2.0.12
```
## Checking health of an etcd member node
etcd provides a `/health` endpoint to verify the health of a particular member.
```sh
curl http://10.0.0.10:2379/health
```
```json
{
"members": [
{
"id": "272e204152",
"name": "infra1",
"peerURLs": [
"http://10.0.0.10:2380"
],
"clientURLs": [
"http://10.0.0.10:2379"
]
},
{
"id": "2225373f43",
"name": "infra2",
"peerURLs": [
"http://10.0.0.11:2380"
],
"clientURLs": [
"http://10.0.0.11:2379"
]
},
]
}
```
## Add a member
Returns an HTTP 201 response code and the representation of added member with a newly generated a memberID when successful. Returns a string describing the failure condition when unsuccessful.
If the POST body is malformed an HTTP 400 will be returned. If the member exists in the cluster or existed in the cluster at some point in the past an HTTP 409 will be returned. If any of the given peerURLs exists in the cluster an HTTP 409 will be returned. If the cluster fails to process the request within timeout an HTTP 500 will be returned, though the request may be processed later.
### Request
```
POST /v2/members HTTP/1.1
{"peerURLs": ["http://10.0.0.10:2380"]}
```
### Example
```sh
curl http://10.0.0.10:2379/v2/members -XPOST \
-H "Content-Type: application/json" -d '{"peerURLs":["http://10.0.0.10:2380"]}'
```
```json
{
"id": "3777296169",
"peerURLs": [
"http://10.0.0.10:2380"
]
}
```
## Delete a member
Remove a member from the cluster. The member ID must be a hex-encoded uint64.
Returns 204 with empty content when successful. Returns a string describing the failure condition when unsuccessful.
If the member does not exist in the cluster an HTTP 500(TODO: fix this) will be returned. If the cluster fails to process the request within timeout an HTTP 500 will be returned, though the request may be processed later.
### Request
```
DELETE /v2/members/<id> HTTP/1.1
```
### Example
```sh
curl http://10.0.0.10:2379/v2/members/272e204152 -XDELETE
```
## Change the peer urls of a member
Change the peer urls of a given member. The member ID must be a hex-encoded uint64. Returns 204 with empty content when successful. Returns a string describing the failure condition when unsuccessful.
If the POST body is malformed an HTTP 400 will be returned. If the member does not exist in the cluster an HTTP 404 will be returned. If any of the given peerURLs exists in the cluster an HTTP 409 will be returned. If the cluster fails to process the request within timeout an HTTP 500 will be returned, though the request may be processed later.
#### Request
```
PUT /v2/members/<id> HTTP/1.1
{"peerURLs": ["http://10.0.0.10:2380"]}
```
#### Example
```sh
curl http://10.0.0.10:2379/v2/members/272e204152 -XPUT \
-H "Content-Type: application/json" -d '{"peerURLs":["http://10.0.0.10:2380"]}'
{"health": "true"}
```

View File

@ -15,7 +15,7 @@ when asked
2. Update your repository data with `pkg update`
3. Install etcd with `pkg install coreos­etcd coreos­etcdctl`
3. Install etcd with `pkg install coreos-etcd coreos-etcdctl`
4. Verify successful installation with `pkg info | grep etcd` and you should get:

View File

@ -1,4 +0,0 @@
etcd is being used successfully by many companies in production. It is,
however, under active development and systems like etcd are difficult to get
correct. If you are comfortable with bleeding-edge software please use etcd and
provide us with the feedback and testing young software needs.

View File

@ -0,0 +1,51 @@
# Production Users
This document tracks people and use cases for etcd in production. By creating a list of production use cases we hope to build a community of advisors that we can reach out to with experience using various etcd applications, operation environments, and cluster sizes. The etcd development team may reach out periodically to check-in on your experience and update this list.
## discovery.etcd.io
- *Application*: https://github.com/coreos/discovery.etcd.io
- *Launched*: Feb. 2014
- *Cluster Size*: 5 members, 5 discovery proxies
- *Order of Data Size*: 100s of Megabytes
- *Operator*: CoreOS, brandon.philips@coreos.com
- *Environment*: AWS
- *Backups*: Periodic async to S3
discovery.etcd.io is the longest continuously running etcd backed service that we know about. It is the basis of automatic cluster bootstrap and was launched in Feb. 2014: https://coreos.com/blog/etcd-0.3.0-released/.
## OpenTable
- *Application*: OpenTable internal service discovery and cluster configuration management
- *Launched*: May 2014
- *Cluster Size*: 3 members each in 6 independent clusters; approximately 50 nodes reading / writing
- *Order of Data Size*: 10s of MB
- *Operator*: OpenTable, Inc; sschlansker@opentable.com
- *Environment*: AWS, VMWare
- *Backups*: None, all data can be re-created if necessary.
## cycoresys.com
- *Application*: multiple
- *Launched*: Jul. 2014
- *Cluster Size*: 3 members, _n_ proxies
- *Order of Data Size*: 100s of kilobytes
- *Operator*: CyCore Systems, Inc, sys@cycoresys.com
- *Environment*: Baremetal
- *Backups*: Periodic sync to Ceph RadosGW and DigitalOcean VM
CyCore Systems provides architecture and engineering for computing systems. This cluster provides microservices, virtual machines, databases, storage clusters to a number of clients. It is built on CoreOS machines, with each machine in the cluster running etcd as a peer or proxy.
## Radius Intelligence
- *Application*: multiple internal tools, Kubernetes clusters, bootstrappable system configs
- *Launched*: June 2015
- *Cluster Size*: 2 clusters of 5 and 3 members; approximately a dozen nodes read/write
- *Order of Data Size*: 100s of kilobytes
- *Operator*: Radius Intelligence; jcderr@radius.com
- *Environment*: AWS, CoreOS, Kubernetes
- *Backups*: None, all data can be recreated if necessary.
Radius Intelligence uses Kubernetes running CoreOS to containerize and scale internal toolsets. Examples include running [JetBrains TeamCity][teamcity] and internal AWS security and cost reporting tools. etcd clusters back these clusters as well as provide some basic environment bootstrapping configuration keys.
[teamcity]: https://www.jetbrains.com/teamcity/

View File

@ -1,36 +1,153 @@
## Proxy
# Proxy
etcd can now run as a transparent proxy. Running etcd as a proxy allows for easily discovery of etcd within your infrastructure, since it can run on each machine as a local service. In this mode, etcd acts as a reverse proxy and forwards client requests to an active etcd cluster. The etcd proxy does not participant in the consensus replication of the etcd cluster, thus it neither increases the resilience nor decreases the write performance of the etcd cluster.
etcd can run as a transparent proxy. Doing so allows for easy discovery of etcd within your infrastructure, since it can run on each machine as a local service. In this mode, etcd acts as a reverse proxy and forwards client requests to an active etcd cluster. The etcd proxy does not participate in the consensus replication of the etcd cluster, thus it neither increases the resilience nor decreases the write performance of the etcd cluster.
etcd currently supports two proxy modes: `readwrite` and `readonly`. The default mode is `readwrite`, which forwards both read and write requests to the etcd cluster. A `readonly` etcd proxy only forwards read requests to the etcd cluster, and returns `HTTP 501` to all write requests.
etcd currently supports two proxy modes: `readwrite` and `readonly`. The default mode is `readwrite`, which forwards both read and write requests to the etcd cluster. A `readonly` etcd proxy only forwards read requests to the etcd cluster, and returns `HTTP 501` to all write requests.
The proxy will shuffle the list of cluster members periodically to avoid sending all connections to a single member.
The member list used by proxy consists of all client URLs advertised within the cluster, as specified in each members' `-advertise-client-urls` flag. If this flag is set incorrectly, requests sent to the proxy are forwarded to wrong addresses and then fail. The fix for this problem is to restart etcd member with correct `-advertise-client-urls` flag. After client URLs list in proxy is recalculated, which happens every 30 seconds, requests will be forwarded correctly.
The member list used by an etcd proxy consists of all client URLs advertised in the cluster. These client URLs are specified in each etcd cluster member's `advertise-client-urls` option.
### Using an etcd proxy
To start etcd in proxy mode, you need to provide three flags: `proxy`, `listen-client-urls`, and `initial-cluster` (or `discovery`).
An etcd proxy examines several command-line options to discover its peer URLs. In order of precedence, these options are `discovery`, `discovery-srv`, and `initial-cluster`. The `initial-cluster` option is set to a comma-separated list of one or more etcd peer URLs used temporarily in order to discover the permanent cluster.
After establishing a list of peer URLs in this manner, the proxy retrieves the list of client URLs from the first reachable peer. These client URLs are specified by the `advertise-client-urls` option to etcd peers. The proxy then continues to connect to the first reachable etcd cluster member every thirty seconds to refresh the list of client URLs.
While etcd proxies therefore do not need to be given the `advertise-client-urls` option, as they retrieve this configuration from the cluster, this implies that `initial-cluster` must be set correctly for every proxy, and the `advertise-client-urls` option must be set correctly for every non-proxy, first-order cluster peer. Otherwise, requests to any etcd proxy would be forwarded improperly. Take special care not to set the `advertise-client-urls` option to URLs that point to the proxy itself, as such a configuration will cause the proxy to enter a loop, forwarding requests to itself until resources are exhausted. To correct either case, stop etcd and restart it with the correct URLs.
[This example Procfile][procfile] illustrates the difference in the etcd peer and proxy command lines used to configure and start a cluster with one proxy under the [goreman process management utility][goreman].
To summarize etcd proxy startup and peer discovery:
1. etcd proxies execute the following steps in order until the cluster *peer-urls* are known:
1. If `discovery` is set for the proxy, ask the given discovery service for
the *peer-urls*. The *peer-urls* will be the combined
`initial-advertise-peer-urls` of all first-order, non-proxy cluster
members.
2. If `discovery-srv` is set for the proxy, the *peer-urls* are discovered
from DNS.
3. If `initial-cluster` is set for the proxy, that will become the value of
*peer-urls*.
4. Otherwise use the default value of
`http://localhost:2380,http://localhost:7001`.
2. These *peer-urls* are used to contact the (non-proxy) members of the cluster
to find their *client-urls*. The *client-urls* will thus be the combined
`advertise-client-urls` of all cluster members (i.e. non-proxies).
3. Request of clients of the proxy will be forwarded (proxied) to these
*client-urls*.
Always start the first-order etcd cluster members first, then any proxies. A proxy must be able to reach the cluster members to retrieve its configuration, and will attempt connections somewhat aggressively in the absence of such a channel. Starting the members before any proxy ensures the proxy can discover the client URLs when it later starts.
## Using an etcd proxy
To start etcd in proxy mode, you need to provide three flags: `proxy`, `listen-client-urls`, and `initial-cluster` (or `discovery`).
To start a readwrite proxy, set `-proxy on`; To start a readonly proxy, set `-proxy readonly`.
The proxy will be listening on `listen-client-urls` and forward requests to the etcd cluster discovered from in `initial-cluster` or `discovery` url.
The proxy will be listening on `listen-client-urls` and forward requests to the etcd cluster discovered from in `initial-cluster` or `discovery` url.
#### Start an etcd proxy with a static configuration
### Start an etcd proxy with a static configuration
To start a proxy that will connect to a statically defined etcd cluster, specify the `initial-cluster` flag:
```
etcd -proxy on -listen-client-urls http://127.0.0.1:8080 -initial-cluster infra0=http://10.0.1.10:2380,infra1=http://10.0.1.11:2380,infra2=http://10.0.1.12:2380
```
#### Start an etcd proxy with the discovery service
If you bootstrap an etcd cluster using the [discovery service][discovery-service], you can also start the proxy with the same `discovery`.
To start a proxy using the discovery service, specify the `discovery` flag. The proxy will wait until the etcd cluster defined at the `discovery` url finishes bootstrapping, and then start to forward the requests.
```
etcd -proxy on -listen-client-urls http://127.0.0.1:8080 -discovery https://discovery.etcd.io/3e86b59982e49066c5d813af1c2e2579cbf573de
etcd --proxy on \
--listen-client-urls http://127.0.0.1:8080 \
--initial-cluster infra0=http://10.0.1.10:2380,infra1=http://10.0.1.11:2380,infra2=http://10.0.1.12:2380
```
#### Fallback to proxy mode with discovery service
If you bootstrap a etcd cluster using [discovery service][discovery-service] with more than the expected number of etcd members, the extra etcd processes will fall back to being `readwrite` proxies by default. They will forward the requests to the cluster as described above. For example, if you create a discovery url with `size=5`, and start ten etcd processes using that same discovery url, the result will be a cluster with five etcd members and five proxies. Note that this behaviour can be disabled with the `proxy-fallback` flag.
### Start an etcd proxy with the discovery service
If you bootstrap an etcd cluster using the [discovery service][discovery-service], you can also start the proxy with the same `discovery`.
To start a proxy using the discovery service, specify the `discovery` flag. The proxy will wait until the etcd cluster defined at the `discovery` url finishes bootstrapping, and then start to forward the requests.
```
etcd --proxy on \
--listen-client-urls http://127.0.0.1:8080 \
--discovery https://discovery.etcd.io/3e86b59982e49066c5d813af1c2e2579cbf573de \
```
## Fallback to proxy mode with discovery service
If you bootstrap an etcd cluster using [discovery service][discovery-service] with more than the expected number of etcd members, the extra etcd processes will fall back to being `readwrite` proxies by default. They will forward the requests to the cluster as described above. For example, if you create a discovery url with `size=5`, and start ten etcd processes using that same discovery url, the result will be a cluster with five etcd members and five proxies. Note that this behaviour can be disabled with the `discovery-fallback='exit'` flag.
## Promote a proxy to a member of etcd cluster
A Proxy is in the part of etcd cluster that does not participate in consensus. A proxy will not promote itself to an etcd member that participates in consensus automatically in any case.
If you want to promote a proxy to an etcd member, there are four steps you need to follow:
- use etcdctl to add the proxy node as an etcd member into the existing cluster
- stop the etcd proxy process or service
- remove the existing proxy data directory
- restart the etcd process with new member configuration
## Example
We assume you have a one member etcd cluster with one proxy. The cluster information is listed below:
|Name|Address|
|------|---------|
|infra0|10.0.1.10|
|proxy0|10.0.1.11|
This example walks you through a case that you promote one proxy to an etcd member. The cluster will become a two member cluster after finishing the four steps.
### Add a new member into the existing cluster
First, use etcdctl to add the member to the cluster, which will output the environment variables need to correctly configure the new member:
``` bash
$ etcdctl -endpoint http://10.0.1.10:2379 member add infra1 http://10.0.1.11:2380
added member 9bf1b35fc7761a23 to cluster
ETCD_NAME="infra1"
ETCD_INITIAL_CLUSTER="infra0=http://10.0.1.10:2380,infra1=http://10.0.1.11:2380"
ETCD_INITIAL_CLUSTER_STATE=existing
```
### Stop the proxy process
Stop the existing proxy so we can wipe it's state on disk and reload it with the new configuration:
``` bash
px aux | grep etcd
kill %etcd_proxy_pid%
```
or (if you are running etcd proxy as etcd service under systemd)
``` bash
sudo systemctl stop etcd
```
### Remove the existing proxy data dir
``` bash
rm -rf %data_dir%/proxy
```
### Start etcd as a new member
Finally, start the reconfigured member and make sure it joins the cluster correctly:
``` bash
$ export ETCD_NAME="infra1"
$ export ETCD_INITIAL_CLUSTER="infra0=http://10.0.1.10:2380,infra1=http://10.0.1.11:2380"
$ export ETCD_INITIAL_CLUSTER_STATE=existing
$ etcd --listen-client-urls http://10.0.1.11:2379 \
--advertise-client-urls http://10.0.1.11:2379 \
--listen-peer-urls http://10.0.1.11:2380 \
--initial-advertise-peer-urls http://10.0.1.11:2380 \
--data-dir %data_dir%
```
If you are running etcd under systemd, you should modify the service file with correct configuration and restart the service:
``` bash
sudo systemd restart etcd
```
If an error occurs, check the [add member troubleshooting doc][runtime-configuration].
[discovery-service]: clustering.md#discovery
[goreman]: https://github.com/mattn/goreman
[procfile]: /Procfile
[runtime-configuration]: runtime-configuration.md#error-cases-when-adding-members

View File

@ -0,0 +1,45 @@
# Reporting Bugs
If you find bugs or documentation mistakes in the etcd project, please let us know by [opening an issue][issue]. We treat bugs and mistakes very seriously and believe no issue is too small. Before creating a bug report, please check that an issue reporting the same problem does not already exist.
To make your bug report accurate and easy to understand, please try to create bug reports that are:
- Specific. Include as much details as possible: which version, what environment, what configuration, etc. You can also attach etcd log (the starting log with etcd configuration is especially important).
- Reproducible. Include the steps to reproduce the problem. We understand some issues might be hard to reproduce, please includes the steps that might lead to the problem. You can also attach the affected etcd data dir and stack strace to the bug report.
- Isolated. Please try to isolate and reproduce the bug with minimum dependencies. It would significantly slow down the speed to fix a bug if too many dependencies are involved in a bug report. Debugging external systems that rely on etcd is out of scope, but we are happy to point you in the right direction or help you interact with etcd in the correct manner.
- Unique. Do not duplicate existing bug report.
- Scoped. One bug per report. Do not follow up with another bug inside one report.
You might also want to read [Elika Etemads article on filing good bug reports][filing-good-bugs] before creating a bug report.
We might ask you for further information to locate a bug. A duplicated bug report will be closed.
## Frequently Asked Questions
### How to get a stack trace
``` bash
$ kill -QUIT $PID
```
### How to get etcd version
``` bash
$ etcd --version
```
### How to get etcd configuration and log when it runs as systemd service etcd2.service
``` bash
$ sudo systemctl cat etcd2
$ sudo journalctl -u etcd2
```
Due to an upstream systemd bug, journald may miss the last few log lines when its process exit. If journalctl tells you that etcd stops without fatal or panic message, you could try `sudo journalctl -f -t etcd2` to get full log.
[etcd-issue]: https://github.com/coreos/etcd/issues/new
[filing-good-bugs]: http://fantasai.inkedblade.net/style/talks/filing-good-bugs/

View File

@ -1,4 +1,10 @@
## Design
# Overview
The etcd v3 API is designed to give users a more efficient and cleaner abstraction compared to etcd v2. There are a number of semantic and protocol changes in this new API. For an overview [see Xiang Li's video](https://youtu.be/J5AioGtEPeQ?t=211).
To prove out the design of the v3 API the team has also built [a number of example recipes](https://github.com/coreos/etcd/tree/master/contrib/recipes), there is a [video discussing these recipes too](https://www.youtube.com/watch?v=fj-2RY-3yVU&feature=youtu.be&t=590).
# Design
1. Flatten binary key-value space
@ -14,27 +20,38 @@
- more efficient/ low cost keep alive
- a logical group of TTL keys
5. Replace CAS/CAD with multi-object Tnx
5. Replace CAS/CAD with multi-object Txn
- MUCH MORE powerful and flexible
6. Support efficient watching with multiple ranges
7. RPC API supports the completed set of APIs.
- more efficient than JSON/HTTP
- additional tnx/lease support
- additional txn/lease support
8. HTTP API supports a subset of APIs.
- easy for people to try out etcd
- easy for people to write simple etcd application
## Notes
### Request Size Limitation
The max request size is around 1MB. Since etcd replicates requests in a streaming fashion, a very large
request might block other requests for a long time. The use case for etcd is to store small configuration
values, so we prevent user from submitting large requests. This also applies to Txn requests. We might loosen
the size in the future a little bit or make it configurable.
## Protobuf Defined API
[protobuf](./v3api.proto)
[api protobuf][api-protobuf]
### Examples
[kv protobuf][kv-protobuf]
#### Put a key (foo=bar)
## Examples
### Put a key (foo=bar)
```
// A put is always successful
Put( PutRequest { key = foo, value = bar } )
@ -42,68 +59,68 @@ Put( PutRequest { key = foo, value = bar } )
PutResponse {
cluster_id = 0x1000,
member_id = 0x1,
index = 1,
revision = 1,
raft_term = 0x1,
}
```
#### Get a key (assume we have foo=bar)
### Get a key (assume we have foo=bar)
```
Get ( RangeRequest { key = foo } )
RangeResponse {
cluster_id = 0x1000,
member_id = 0x1,
index = 1,
revision = 1,
raft_term = 0x1,
kvs = {
{
key = foo,
value = bar,
create_index = 1,
mod_index = 1,
create_revision = 1,
mod_revision = 1,
version = 1;
},
},
}
```
#### Range over a key space (assume we have foo0=bar0… foo100=bar100)
### Range over a key space (assume we have foo0=bar0… foo100=bar100)
```
Range ( RangeRequest { key = foo, end_key = foo80, limit = 30 } )
RangeResponse {
cluster_id = 0x1000,
member_id = 0x1,
index = 100,
revision = 100,
raft_term = 0x1,
kvs = {
{
key = foo0,
value = bar0,
create_index = 1,
mod_index = 1,
create_revision = 1,
mod_revision = 1,
version = 1;
},
...,
{
key = foo30,
value = bar30,
create_index = 30,
mod_index = 30,
create_revision = 30,
mod_revision = 30,
version = 1;
},
},
}
```
#### Finish a tnx (assume we have foo0=bar0, foo1=bar1)
### Finish a txn (assume we have foo0=bar0, foo1=bar1)
```
Tnx(TnxRequest {
// mod_index of foo0 is equal to 1, mod_index of foo1 is greater than 1
Txn(TxnRequest {
// mod_revision of foo0 is equal to 1, mod_revision of foo1 is greater than 1
compare = {
{compareType = equal, key = foo0, mod_index = 1},
{compareType = greater, key = foo1, mod_index = 1}}
{compareType = equal, key = foo0, mod_revision = 1},
{compareType = greater, key = foo1, mod_revision = 1}}
},
// if the comparison succeeds, put foo2 = bar2
success = {PutRequest { key = foo2, value = success }},
@ -111,10 +128,10 @@ Tnx(TnxRequest {
failure = {PutRequest { key = foo2, value = failure }},
)
TnxResponse {
TxnResponse {
cluster_id = 0x1000,
member_id = 0x1,
index = 3,
revision = 3,
raft_term = 0x1,
succeeded = true,
responses = {
@ -122,21 +139,21 @@ TnxResponse {
{
cluster_id = 0x1000,
member_id = 0x1,
index = 3,
revision = 3,
raft_term = 0x1,
}
}
}
```
#### Watch on a key/range
### Watch on a key/range
```
Watch( WatchRequest{
key = foo,
end_key = fop, // prefix foo
start_index = 20,
end_index = 10000,
start_revision = 20,
end_revision = 10000,
// server decided notification frequency
progress_notification = true,
}
@ -147,14 +164,14 @@ Watch( WatchRequest{
WatchResponse {
cluster_id = 0x1000,
member_id = 0x1,
index = 3,
revision = 3,
raft_term = 0x1,
event_type = put,
kv = {
key = foo0,
value = bar0,
create_index = 1,
mod_index = 1,
create_revision = 1,
mod_revision = 1,
version = 1;
},
}
@ -164,7 +181,7 @@ WatchResponse {
WatchResponse {
cluster_id = 0x1000,
member_id = 0x1,
index = 2000,
revision = 2000,
raft_term = 0x1,
// nil event as notification
}
@ -175,17 +192,20 @@ WatchResponse {
WatchResponse {
cluster_id = 0x1000,
member_id = 0x1,
index = 3000,
revision = 3000,
raft_term = 0x1,
event_type = put,
kv = {
key = foo0,
value = bar3000,
create_index = 1,
mod_index = 3000,
create_revision = 1,
mod_revision = 3000,
version = 2;
},
}
```
[api-protobuf]: https://github.com/coreos/etcd/blob/master/etcdserver/etcdserverpb/rpc.proto
[kv-protobuf]: https://github.com/coreos/etcd/blob/master/storage/storagepb/kv.proto

View File

@ -1,272 +0,0 @@
syntax = "proto3";
// Interface exported by the server.
service etcd {
// Range gets the keys in the range from the store.
rpc Range(RangeRequest) returns (RangeResponse) {}
// Put puts the given key into the store.
// A put request increases the index of the store,
// and generates one event in the event history.
rpc Put(PutRequest) returns (PutResponse) {}
// Delete deletes the given range from the store.
// A delete request increase the index of the store,
// and generates one event in the event history.
rpc DeleteRange(DeleteRangeRequest) returns (DeleteRangeResponse) {}
// Tnx processes all the requests in one transaction.
// A tnx request increases the index of the store,
// and generates events with the same index in the event history.
rpc Tnx(TnxRequest) returns (TnxResponse) {}
// Watch watches the events happening or happened in etcd. Both input and output
// are stream. One watch rpc can watch for multiple ranges and get a stream of
// events. The whole events history can be watched unless compacted.
rpc WatchRange(stream WatchRangeRequest) returns (stream WatchRangeResponse) {}
// Compact compacts the event history in etcd. User should compact the
// event history periodically, or it will grow infinitely.
rpc Compact(CompactionRequest) returns (CompactionResponse) {}
// LeaseCreate creates a lease. A lease has a TTL. The lease will expire if the
// server does not receive a keepAlive within TTL from the lease holder.
// All keys attached to the lease will be expired and deleted if the lease expires.
// The key expiration generates an event in event history.
rpc LeaseCreate(LeaseCreateRequest) returns (LeaseCreateResponse) {}
// LeaseRevoke revokes a lease. All the key attached to the lease will be expired and deleted.
rpc LeaseRevoke(LeaseRevokeRequest) returns (LeaseRevokeResponse) {}
// LeaseAttach attaches keys with a lease.
rpc LeaseAttach(LeaseAttachRequest) returns (LeaseAttachResponse) {}
// LeaseTnx likes Tnx. It has two addition success and failure LeaseAttachRequest list.
// If the Tnx is successful, then the success list will be executed. Or the failure list
// will be executed.
rpc LeaseTnx(LeaseTnxRequest) returns (LeaseTnxResponse) {}
// KeepAlive keeps the lease alive.
rpc LeaseKeepAlive(stream LeaseKeepAliveRequest) returns (stream LeaseKeepAliveResponse) {}
}
message ResponseHeader {
// an error type message?
optional string error = 1;
optional uint64 cluster_id = 2;
optional uint64 member_id = 3;
// index of the store when the request was applied.
optional int64 index = 4;
// term of raft when the request was applied.
optional uint64 raft_term = 5;
}
message RangeRequest {
// if the range_end is not given, the request returns the key.
optional bytes key = 1;
// if the range_end is given, it gets the keys in range [key, range_end).
optional bytes range_end = 2;
// limit the number of keys returned.
optional int64 limit = 3;
// the response will be consistent with previous request with same token if the token is
// given and is vaild.
optional bytes consistent_token = 4;
}
message RangeResponse {
optional ResponseHeader header = 1;
repeated KeyValue kvs = 2;
optional bytes consistent_token = 3;
}
message PutRequest {
optional bytes key = 1;
optional bytes value = 2;
}
message PutResponse {
optional ResponseHeader header = 1;
}
message DeleteRangeRequest {
// if the range_end is not given, the request deletes the key.
optional bytes key = 1;
// if the range_end is given, it deletes the keys in range [key, range_end).
optional bytes range_end = 2;
}
message DeleteRangeResponse {
optional ResponseHeader header = 1;
}
message RequestUnion {
oneof request {
RangeRequest request_range = 1;
PutRequest request_put = 2;
DeleteRangeRequest request_delete_range = 3;
}
}
message ResponseUnion {
oneof response {
RangeResponse reponse_range = 1;
PutResponse response_put = 2;
DeleteRangeResponse response_delete_range = 3;
}
}
message Compare {
enum CompareType {
EQUAL = 0;
GREATER = 1;
LESS = 2;
}
optional CompareType type = 1;
// key path
optional bytes key = 2;
oneof target {
// version of the given key
int64 version = 3;
// create index of the given key
int64 create_index = 4;
// last modified index of the given key
int64 mod_index = 5;
// value of the given key
bytes value = 6;
}
}
// First all the compare requests are processed.
// If all the compare succeed, all the success
// requests will be processed.
// Or all the failure requests will be processed and
// all the errors in the comparison will be returned.
// From google paxosdb paper:
// Our implementation hinges around a powerful primitive which we call MultiOp. All other database
// operations except for iteration are implemented as a single call to MultiOp. A MultiOp is applied atomically
// and consists of three components:
// 1. A list of tests called guard. Each test in guard checks a single entry in the database. It may check
// for the absence or presence of a value, or compare with a given value. Two different tests in the guard
// may apply to the same or different entries in the database. All tests in the guard are applied and
// MultiOp returns the results. If all tests are true, MultiOp executes t op (see item 2 below), otherwise
// it executes f op (see item 3 below).
// 2. A list of database operations called t op. Each operation in the list is either an insert, delete, or
// lookup operation, and applies to a single database entry. Two different operations in the list may apply
// to the same or different entries in the database. These operations are executed
// if guard evaluates to
// true.
// 3. A list of database operations called f op. Like t op, but executed if guard evaluates to false.
message TnxRequest {
repeated Compare compare = 1;
repeated RequestUnion success = 2;
repeated RequestUnion failure = 3;
}
message TnxResponse {
optional ResponseHeader header = 1;
optional bool succeeded = 2;
repeated ResponseUnion responses = 3;
}
message KeyValue {
optional bytes key = 1;
// mod_index is the last modified index of the key.
optional int64 create_index = 2;
optional int64 mod_index = 3;
// version is the version of the key. A deletion resets
// the version to zero and any modification of the key
// increases its version.
optional int64 version = 4;
optional bytes value = 5;
}
message WatchRangeRequest {
// if the range_end is not given, the request returns the key.
optional bytes key = 1;
// if the range_end is given, it gets the keys in range [key, range_end).
optional bytes range_end = 2;
// start_index is an optional index (including) to watch from. No start_index is "now".
optional int64 start_index = 3;
// end_index is an optional index (excluding) to end watch. No end_index is "forever".
optional int64 end_index = 4;
optional bool progress_notification = 5;
}
message WatchRangeResponse {
optional ResponseHeader header = 1;
repeated Event events = 2;
}
message Event {
enum EventType {
PUT = 0;
DELETE = 1;
EXPIRE = 2;
}
optional EventType event_type = 1;
// a put event contains the current key-value
// a delete/expire event contains the previous
// key-value
optional KeyValue kv = 2;
}
message CompactionRequest {
optional int64 index = 1;
}
message CompactionResponse {
optional ResponseHeader header = 1;
}
message LeaseCreateRequest {
// advisory ttl in seconds
optional int64 ttl = 1;
}
message LeaseCreateResponse {
optional ResponseHeader header = 1;
optional int64 lease_id = 2;
// server decided ttl in second
optional int64 ttl = 3;
optional string error = 4;
}
message LeaseRevokeRequest {
optional int64 lease_id = 1;
}
message LeaseRevokeResponse {
optional ResponseHeader header = 1;
}
message LeaseTnxRequest {
optional TnxRequest request = 1;
repeated LeaseAttachRequest success = 2;
repeated LeaseAttachRequest failure = 3;
}
message LeaseTnxResponse {
optional ResponseHeader header = 1;
optional TnxResponse response = 2;
repeated LeaseAttachResponse attach_responses = 3;
}
message LeaseAttachRequest {
optional int64 lease_id = 1;
optional bytes key = 2;
}
message LeaseAttachResponse {
optional ResponseHeader header = 1;
}
message LeaseKeepAliveRequest {
optional int64 lease_id = 1;
}
message LeaseKeepAliveResponse {
optional ResponseHeader header = 1;
optional int64 lease_id = 2;
optional int64 ttl = 3;
}

View File

@ -1,14 +1,14 @@
## Runtime Reconfiguration
# Runtime Reconfiguration
etcd comes with support for incremental runtime reconfiguration, which allows users to update the membership of the cluster at run time.
Reconfiguration requests can only be processed when the the majority of the cluster members are functioning. It is **highly recommended** to always have a cluster size greater than two in production. It is unsafe to remove a member from a two member cluster. The majority of a two member cluster is also two. If there is a failure during the removal process, the cluster might not able to make progress and need to [restart from majority failure][majority failure].
Reconfiguration requests can only be processed when the majority of the cluster members are functioning. It is **highly recommended** to always have a cluster size greater than two in production. It is unsafe to remove a member from a two member cluster. The majority of a two member cluster is also two. If there is a failure during the removal process, the cluster might not able to make progress and need to [restart from majority failure][majority failure].
[majority failure]: #restart-cluster-from-majority-failure
To better understand the design behind runtime reconfiguration, we suggest you read [the runtime reconfiguration document][runtime-reconf].
## Reconfiguration Use Cases
Let us walk through some common reasons for reconfiguring a cluster. Most of these just involve combinations of adding or removing a member, which are explained below under [Cluster Reconfiguration Operations](#cluster-reconfiguration-operations).
Let's walk through some common reasons for reconfiguring a cluster. Most of these just involve combinations of adding or removing a member, which are explained below under [Cluster Reconfiguration Operations][cluster-reconf].
### Cycle or Upgrade Multiple Machines
@ -16,33 +16,23 @@ If you need to move multiple members of your cluster due to planned maintenance
It is safe to remove the leader, however there is a brief period of downtime while the election process takes place. If your cluster holds more than 50MB, it is recommended to [migrate the member's data directory][member migration].
[member migration]: admin_guide.md#member-migration
### Change the Cluster Size
Increasing the cluster size can enhance [failure tolerance][fault tolerance table] and provide better read performance. Since clients can read from any member, increasing the number of members increases the overall read throughput.
Decreasing the cluster size can improve the write performance of a cluster, with a trade-off of decreased resilience. Writes into the cluster are replicated to a majority of members of the cluster before considered committed. Decreasing the cluster size lowers the majority, and each write is committed more quickly.
[fault tolerance table]: admin_guide.md#fault-tolerance-table
### Replace A Failed Machine
If a machine fails due to hardware failure, data directory corruption, or some other fatal situation, it should be replaced as soon as possible. Machines that have failed but haven't been removed adversely affect your quorum and reduce the tolerance for an additional failure.
To replace the machine, follow the instructions for [removing the member][remove member] from the cluster, and then [add a new member][add member] in its place. If your cluster holds more than 50MB, it is recommended to [migrate the failed member's data directory][member migration] if you can still access it.
[remove member]: #remove-a-member
[add member]: #add-a-new-member
### Restart Cluster from Majority Failure
If the majority of your cluster is lost, then you need to take manual action in order to recover safely.
If the majority of your cluster is lost or all of your nodes have changed IP addresses, then you need to take manual action in order to recover safely.
The basic steps in the recovery process include [creating a new cluster using the old data][disaster recovery], forcing a single member to act as the leader, and finally using runtime configuration to [add new members][add member] to this new cluster one at a time.
[add member]: #add-a-new-member
[disaster recovery]: admin_guide.md#disaster-recovery
## Cluster Reconfiguration Operations
Now that we have the use cases in mind, let us lay out the operations involved in each.
@ -52,28 +42,51 @@ This is essentially the same requirement as for any other write to etcd.
All changes to the cluster are done one at a time:
To replace a single member you will make an add then a remove operation
To increase from 3 to 5 members you will make two add operations
To decrease from 5 to 3 you will make two remove operations
* To update a single member peerURLs you will make an update operation
* To replace a single member you will make an add then a remove operation
* To increase from 3 to 5 members you will make two add operations
* To decrease from 5 to 3 you will make two remove operations
All of these examples will use the `etcdctl` command line tool that ships with etcd.
If you want to use the member API directly you can find the documentation [here](other_apis.md).
If you want to use the members API directly you can find the documentation [here][member-api].
### Update a Member
#### Update advertise client URLs
If you would like to update the advertise client URLs of a member, you can simply restart
that member with updated client urls flag (`--advertise-client-urls`) or environment variable
(`ETCD_ADVERTISE_CLIENT_URLS`). The restarted member will self publish the updated URLs.
A wrongly updated client URL will not affect the health of the etcd cluster.
#### Update advertise peer URLs
If you would like to update the advertise peer URLs of a member, you have to first update
it explicitly via member command and then restart the member. The additional action is required
since updating peer URLs changes the cluster wide configuration and can affect the health of the etcd cluster.
To update the peer URLs, first, we need to find the target member's ID. You can list all members with `etcdctl`:
```sh
$ etcdctl member list
6e3bd23ae5f1eae0: name=node2 peerURLs=http://localhost:23802 clientURLs=http://127.0.0.1:23792
924e2e83e93f2560: name=node3 peerURLs=http://localhost:23803 clientURLs=http://127.0.0.1:23793
a8266ecf031671f3: name=node1 peerURLs=http://localhost:23801 clientURLs=http://127.0.0.1:23791
```
In this example let's `update` a8266ecf031671f3 member ID and change its peerURLs value to http://10.0.1.10:2380
```sh
$ etcdctl member update a8266ecf031671f3 http://10.0.1.10:2380
Updated member with ID a8266ecf031671f3 in cluster
```
### Remove a Member
First, we need to find the target member's ID. You can list all members with `etcdctl`:
```
$ etcdctl member list
6e3bd23ae5f1eae0: name=node2 peerURLs=http://localhost:7002 clientURLs=http://127.0.0.1:4002
924e2e83e93f2560: name=node3 peerURLs=http://localhost:7003 clientURLs=http://127.0.0.1:4003
a8266ecf031671f3: name=node1 peerURLs=http://localhost:7001 clientURLs=http://127.0.0.1:4001
```
Let us say the member ID we want to remove is a8266ecf031671f3.
We then use the `remove` command to perform the removal:
```
```sh
$ etcdctl member remove a8266ecf031671f3
Removed member a8266ecf031671f3 from cluster
```
@ -90,12 +103,12 @@ It is safe to remove the leader, however the cluster will be inactive while a ne
Adding a member is a two step process:
* Add the new member to the cluster via the [members API](other_apis.md#post-v2members) or the `etcdctl member add` command.
* Add the new member to the cluster via the [members API][member-api] or the `etcdctl member add` command.
* Start the new member with the new cluster configuration, including a list of the updated members (existing members + the new member).
Using `etcdctl` let's add the new member to the cluster by specifying its [name](configuration.md#-name) and [advertised peer URLs](configuration.md#-initial-advertise-peer-urls):
Using `etcdctl` let's add the new member to the cluster by specifying its [name][conf-name] and [advertised peer URLs][conf-adv-peer]:
```
```sh
$ etcdctl member add infra3 http://10.0.1.13:2380
added member 9bf1b35fc7761a23 to cluster
@ -107,11 +120,11 @@ ETCD_INITIAL_CLUSTER_STATE=existing
`etcdctl` has informed the cluster about the new member and printed out the environment variables needed to successfully start it.
Now start the new etcd process with the relevant flags for the new member:
```
```sh
$ export ETCD_NAME="infra3"
$ export ETCD_INITIAL_CLUSTER="infra0=http://10.0.1.10:2380,infra1=http://10.0.1.11:2380,infra2=http://10.0.1.12:2380,infra3=http://10.0.1.13:2380"
$ export ETCD_INITIAL_CLUSTER_STATE=existing
$ etcd -listen-client-urls http://10.0.1.13:2379 -advertise-client-urls http://10.0.1.13:2379 -listen-peer-urls http://10.0.1.13:2380 -initial-advertise-peer-urls http://10.0.1.13:2380
$ etcd -listen-client-urls http://10.0.1.13:2379 -advertise-client-urls http://10.0.1.13:2379 -listen-peer-urls http://10.0.1.13:2380 -initial-advertise-peer-urls http://10.0.1.13:2380 -data-dir %data_dir%
```
The new member will run as a part of the cluster and immediately begin catching up with the rest of the cluster.
@ -119,12 +132,12 @@ The new member will run as a part of the cluster and immediately begin catching
If you are adding multiple members the best practice is to configure a single member at a time and verify it starts correctly before adding more new members.
If you add a new member to a 1-node cluster, the cluster cannot make progress before the new member starts because it needs two members as majority to agree on the consensus. You will only see this behavior between the time `etcdctl member add` informs the cluster about the new member and the new member successfully establishing a connection to the existing one.
#### Error Cases
#### Error Cases When Adding Members
In the following case we have not included our new host in the list of enumerated nodes.
If this is a new cluster, the node must be added to the list of initial cluster members.
```
```sh
$ etcd -name infra3 \
-initial-cluster infra0=http://10.0.1.10:2380,infra1=http://10.0.1.11:2380,infra2=http://10.0.1.12:2380 \
-initial-cluster-state existing
@ -134,7 +147,7 @@ exit 1
In this case we give a different address (10.0.1.14:2380) to the one that we used to join the cluster (10.0.1.13:2380).
```
```sh
$ etcd -name infra4 \
-initial-cluster infra0=http://10.0.1.10:2380,infra1=http://10.0.1.11:2380,infra2=http://10.0.1.12:2380,infra4=http://10.0.1.14:2380 \
-initial-cluster-state existing
@ -142,10 +155,30 @@ etcdserver: assign ids error: unmatched member while checking PeerURLs
exit 1
```
When we start etcd using the data directory of a removed member, etcd will exit automatically if it connects to any alive member in the cluster:
When we start etcd using the data directory of a removed member, etcd will exit automatically if it connects to any active member in the cluster:
```
```sh
$ etcd
etcd: this member has been permanently removed from the cluster. Exiting.
exit 1
```
### Strict Reconfiguration Check Mode (`-strict-reconfig-check`)
As described in the above, the best practice of adding new members is to configure a single member at a time and verify it starts correctly before adding more new members. This step by step approach is very important because if newly added members is not configured correctly (for example the peer URLs are incorrect), the cluster can lose quorum. The quorum loss happens since the newly added member are counted in the quorum even if that member is not reachable from other existing members. Also quorum loss might happen if there is a connectivity issue or there are operational issues.
For avoiding this problem, etcd provides an option `-strict-reconfig-check`. If this option is passed to etcd, etcd rejects reconfiguration requests if the number of started members will be less than a quorum of the reconfigured cluster.
It is recommended to enable this option. However, it is disabled by default because of keeping compatibility.
[add member]: #add-a-new-member
[cluster-reconf]: #cluster-reconfiguration-operations
[conf-adv-peer]: configuration.md#-initial-advertise-peer-urls
[conf-name]: configuration.md#-name
[disaster recovery]: admin_guide.md#disaster-recovery
[fault tolerance table]: admin_guide.md#fault-tolerance-table
[majority failure]: #restart-cluster-from-majority-failure
[member-api]: members_api.md
[member migration]: admin_guide.md#member-migration
[remove member]: #remove-a-member
[runtime-reconf]: runtime-reconf-design.md

View File

@ -0,0 +1,50 @@
# Design of Runtime Reconfiguration
Runtime reconfiguration is one of the hardest and most error prone features in a distributed system, especially in a consensus based system like etcd.
Read on to learn about the design of etcd's runtime reconfiguration commands and how we tackled these problems.
## Two Phase Config Changes Keep you Safe
In etcd, every runtime reconfiguration has to go through [two phases][add-member] for safety reasons. For example, to add a member you need to first inform cluster of new configuration and then start the new member.
Phase 1 - Inform cluster of new configuration
To add a member into etcd cluster, you need to make an API call to request a new member to be added to the cluster. And this is only way that you can add a new member into an existing cluster. The API call returns when the cluster agrees on the configuration change.
Phase 2 - Start new member
To join the etcd member into the existing cluster, you need to specify the correct `initial-cluster` and set `initial-cluster-state` to `existing`. When the member starts, it will contact the existing cluster first and verify the current cluster configuration matches the expected one specified in `initial-cluster`. When the new member successfully starts, you know your cluster reached the expected configuration.
By splitting the process into two discrete phases users are forced to be explicit regarding cluster membership changes. This actually gives users more flexibility and makes things easier to reason about. For example, if there is an attempt to add a new member with the same ID as an existing member in an etcd cluster, the action will fail immediately during phase one without impacting the running cluster. Similar protection is provided to prevent adding new members by mistake. If a new etcd member attempts to join the cluster before the cluster has accepted the configuration change,, it will not be accepted by the cluster.
Without the explicit workflow around cluster membership etcd would be vulnerable to unexpected cluster membership changes. For example, if etcd is running under an init system such as systemd, etcd would be restarted after being removed via the membership API, and attempt to rejoin the cluster on startup. This cycle would continue every time a member is removed via the API and systemd is set to restart etcd after failing, which is unexpected.
We think runtime reconfiguration should be a low frequent operation. We made the decision to keep it explicit and user-driven to ensure configuration safety and keep your cluster always running smoothly under your control.
## Permanent Loss of Quorum Requires New Cluster
If a cluster permanently loses a majority of its members, a new cluster will need to be started from an old data directory to recover the previous state.
It is entirely possible to force removing the failed members from the existing cluster to recover. However, we decided not to support this method since it bypasses the normal consensus committing phase, which is unsafe. If the member to remove is not actually dead or you force to remove different members through different members in the same cluster, you will end up with diverged cluster with same clusterID. This is very dangerous and hard to debug/fix afterwards.
If you have a correct deployment, the possibility of permanent majority lose is very low. But it is a severe enough problem that worth special care. We strongly suggest you to read the [disaster recovery documentation][disaster-recovery] and prepare for permanent majority lose before you put etcd into production.
## Do Not Use Public Discovery Service For Runtime Reconfiguration
The public discovery service should only be used for bootstrapping a cluster. To join member into an existing cluster, you should use runtime reconfiguration API.
Discovery service is designed for bootstrapping an etcd cluster in the cloud environment, when you do not know the IP addresses of all the members beforehand. After you successfully bootstrap a cluster, the IP addresses of all the members are known. Technically, you should not need the discovery service any more.
It seems that using public discovery service is a convenient way to do runtime reconfiguration, after all discovery service already has all the cluster configuration information. However relying on public discovery service brings troubles:
1. it introduces external dependencies for the entire life-cycle of your cluster, not just bootstrap time. If there is a network issue between your cluster and public discovery service, your cluster will suffer from it.
2. public discovery service must reflect correct runtime configuration of your cluster during it life-cycle. It has to provide security mechanism to avoid bad actions, and it is hard.
3. public discovery service has to keep tens of thousands of cluster configurations. Our public discovery service backend is not ready for that workload.
If you want to have a discovery service that supports runtime reconfiguration, the best choice is to build your private one.
[add-member]: runtime-configuration.md#add-a-new-member
[disaster-recovery]: admin_guide.md#disaster-recovery

View File

@ -1,12 +1,10 @@
# security model
# Security Model
etcd supports SSL/TLS as well as authentication through client certificates, both for clients to server as well as peer (server to server / cluster) communication.
To get up and running you first need to have a CA certificate and a signed key pair for one member. It is recommended to create and sign a new key pair for every member in a cluster.
For convenience the [etcd-ca](https://github.com/coreos/etcd-ca) tool provides an easy interface to certificate generation, alternatively this site provides a good reference on how to generate self-signed key pairs:
http://www.g-loaded.eu/2005/11/10/be-your-own-ca/
For convenience, the [cfssl] tool provides an easy interface to certificate generation, and we provide an example using the tool [here][tls-setup]. You can also examine this [alternative guide to generating self-signed key pairs][tls-guide].
## Basic setup
@ -97,7 +95,7 @@ $ curl --cacert /path/to/ca.crt --cert /path/to/client.crt --key /path/to/client
-L https://127.0.0.1:2379/v2/keys/foo -XPUT -d value=bar -v
```
You should able to see:
You should be able to see:
```
...
@ -138,13 +136,21 @@ $ etcd -name infra1 -data-dir infra1 \
# member2
$ etcd -name infra2 -data-dir infra2 \
-peer-client-cert-atuh -peer-trusted-ca-file=/path/to/ca.crt -peer-cert-file=/path/to/member2.crt -peer-key-file=/path/to/member2.key \
-peer-client-cert-auth -peer-trusted-ca-file=/path/to/ca.crt -peer-cert-file=/path/to/member2.crt -peer-key-file=/path/to/member2.key \
-initial-advertise-peer-urls=https://10.0.1.11:2380 -listen-peer-urls=https://10.0.1.11:2380 \
-discovery ${DISCOVERY_URL}
```
The etcd members will form a cluster and all communication between members in the cluster will be encrypted and authenticated using the client certificates. You will see in the output of etcd that the addresses it connects to use HTTPS.
## Notes For etcd Proxy
etcd proxy terminates the TLS from its client if the connection is secure, and uses proxy's own key/cert specified in `--peer-key-file` and `--peer-cert-file` to communicate with etcd members.
The proxy communicates with etcd members through both the `--advertise-client-urls` and `--advertise-peer-urls` of a given member. It forwards client requests to etcd members advertised client urls, and it syncs the initial cluster configuration through etcd members advertised peer urls.
When client authentication is enabled for an etcd member, the administrator must ensure that the peer certificate specified in the proxy's `--peer-cert-file` option is valid for that authentication. The proxy's peer certificate must also be valid for peer authentication if peer authentication is enabled.
## Frequently Asked Questions
### My cluster is not working with peer tls configuration?
@ -152,7 +158,7 @@ The etcd members will form a cluster and all communication between members in th
The internal protocol of etcd v2.0.x uses a lot of short-lived HTTP connections.
So, when enabling TLS you may need to increase the heartbeat interval and election timeouts to reduce internal cluster connection churn.
A reasonable place to start are these values: ` --heartbeat-interval 500 --election-timeout 2500`.
This issues is resolved in the etcd v2.1.x series of releases which uses fewer connections.
These issues are resolved in the etcd v2.1.x series of releases which uses fewer connections.
### I'm seeing a SSLv3 alert handshake failure when using SSL client authentication?
@ -179,4 +185,9 @@ $ openssl ca -config openssl.cnf -policy policy_anything -extensions ssl_client
### With peer certificate authentication I receive "certificate is valid for 127.0.0.1, not $MY_IP"
Make sure that you sign your certificates with a Subject Name your member's public IP address. The `etcd-ca` tool for example provides an `--ip=` option for its `new-cert` command.
If you need your certificate to be signed for your member's FQDN in its Subject Name then you could use Subject Alternative Names (short IP SANs) to add your IP address. The `etcd-ca` tool provides `--domain=` option for its `new-cert` command, and openssl can make [it](http://wiki.cacert.org/FAQ/subjectAltName) too.
If you need your certificate to be signed for your member's FQDN in its Subject Name then you could use Subject Alternative Names (short IP SANs) to add your IP address. The `etcd-ca` tool provides `--domain=` option for its `new-cert` command, and openssl can make [it][alt-name] too.
[cfssl]: https://github.com/cloudflare/cfssl
[tls-setup]: /hack/tls-setup
[tls-guide]: https://github.com/coreos/docs/blob/master/os/generate-self-signed-certificates.md
[alt-name]: http://wiki.cacert.org/FAQ/subjectAltName

View File

@ -1,16 +1,16 @@
## Tuning
# Tuning
The default settings in etcd should work well for installations on a local network where the average network latency is low.
However, when using etcd across multiple data centers or over networks with high latency you may need to tweak the heartbeat interval and election timeout settings.
The network isn't the only source of latency. Each request and response may be impacted by slow disks on both the leader and follower. Each of these timeouts represents the total time from request to successful response from the other machine.
### Time Parameters
## Time Parameters
The underlying distributed consensus protocol relies on two separate time parameters to ensure that nodes can handoff leadership if one stalls or goes offline.
The first parameter is called the *Heartbeat Interval*.
This is the frequency with which the leader will notify followers that it is still the leader.
etcd batches commands together for higher throughput so this heartbeat interval is also a delay for how long it takes for commands to be committed.
For best practices, the parameter should be set around round-trip time between members.
By default, etcd uses a `100ms` heartbeat interval.
The second parameter is the *Election Timeout*.
@ -18,16 +18,25 @@ This timeout is how long a follower node will go without hearing a heartbeat bef
By default, etcd uses a `1000ms` election timeout.
Adjusting these values is a trade off.
Lowering the heartbeat interval will cause individual commands to be committed faster but it will lower the overall throughput of etcd.
If your etcd instances have low utilization then lowering the heartbeat interval can improve your command response time.
The value of heartbeat interval is recommended to be around the maximum of average round-trip time (RTT) between members, normally around 0.5-1.5x the round-trip time.
If heartbeat interval is too low, etcd will send unnecessary messages that increase the usage of CPU and network resources.
On the other side, a too high heartbeat interval leads to high election timeout. Higher election timeout takes longer time to detect a leader failure.
The easiest way to measure round-trip time (RTT) is to use [PING utility][ping].
The election timeout should be set based on the heartbeat interval and your network ping time between nodes.
Election timeouts should be at least 10 times your ping time so it can account for variance in your network.
For example, if the ping time between your nodes is 10ms then you should have at least a 100ms election timeout.
The election timeout should be set based on the heartbeat interval and average round-trip time between members.
Election timeouts must be at least 10 times the round-trip time so it can account for variance in your network.
For example, if the round-trip time between your members is 10ms then you should have at least a 100ms election timeout.
You should also set your election timeout to at least 5 to 10 times your heartbeat interval to account for variance in leader replication.
For a heartbeat interval of 50ms you should set your election timeout to at least 250ms - 500ms.
The upper limit of election timeout is 50000ms (50s), which should only be used when deploying a globally-distributed etcd cluster.
A reasonable round-trip time for the continental United States is 130ms, and the time between US and Japan is around 350-400ms.
If your network has uneven performance or regular packet delays/loss then it is possible that a couple of retries may be necessary to successfully send a packet. So 5s is a safe upper limit of global round-trip time.
As the election timeout should be an order of magnitude bigger than broadcast time, in the case of ~5s for a globally distributed cluster, then 50 seconds becomes a reasonable maximum.
The heartbeat interval and election timeout value should be the same for all members in one cluster. Setting different values for etcd members may disrupt cluster stability.
You can override the default values on the command line:
```sh
@ -40,7 +49,7 @@ $ ETCD_HEARTBEAT_INTERVAL=100 ETCD_ELECTION_TIMEOUT=500 etcd
The values are specified in milliseconds.
### Snapshots
## Snapshots
etcd appends all key changes to a log file.
This log grows forever and is a complete linear history of every change made to the keys.
@ -62,3 +71,5 @@ $ etcd -snapshot-count=5000
# Environment variables:
$ ETCD_SNAPSHOT_COUNT=5000 etcd
```
[ping]: https://en.wikipedia.org/wiki/Ping_(networking_utility)

View File

@ -1,4 +1,4 @@
## Upgrade etcd to 2.1
# Upgrade etcd to 2.1
In the general case, upgrading from etcd 2.0 to 2.1 can be a zero-downtime, rolling upgrade:
- one by one, stop the etcd v2.0 processes and replace them with etcd v2.1 processes
@ -6,29 +6,29 @@ In the general case, upgrading from etcd 2.0 to 2.1 can be a zero-downtime, roll
Before [starting an upgrade](#upgrade-procedure), read through the rest of this guide to prepare.
### Upgrade Checklists
## Upgrade Checklists
#### Upgrade Requirement
### Upgrade Requirements
To upgrade an existing etcd deployment to 2.1, you must be running 2.0. If youre running a version of etcd before 2.0, you must upgrade to [2.0](https://github.com/coreos/etcd/releases/tag/v2.0.13) before upgrading to 2.1.
To upgrade an existing etcd deployment to 2.1, you must be running 2.0. If youre running a version of etcd before 2.0, you must upgrade to [2.0][v2.0] before upgrading to 2.1.
Also, to ensure a smooth rolling upgrade, your running cluster must be healthy. You can check the health of the cluster by using `etcdctl cluster-health` command.
#### Preparedness
### Preparedness
Before upgrading etcd, always test the services relying on etcd in a staging environment before deploying the upgrade to the production environment.
You might also want to [backup your data directory](admin_guide.md#backing-up-the-datastore) for a potential [downgrade](#downgrade).
You might also want to [backup your data directory][backup-datastore] for a potential [downgrade](#downgrade).
etcd 2.1 introduces a new [authentication](auth_api.md) feature, which is disabled by default. If your deployment depends on these, you may want to test the auth features before enabling them in production.
etcd 2.1 introduces a new [authentication][auth] feature, which is disabled by default. If your deployment depends on these, you may want to test the auth features before enabling them in production.
#### Mixed Versions
### Mixed Versions
While upgrading, an etcd cluster supports mixed versions of etcd members. The cluster is only considered upgraded once all its members are upgraded to 2.1.
Internally, etcd members negotiate with each other to determine the overall etcd cluster version, which controls the reported cluster version and the supported features. For example, if you are mid-upgrade, any 2.1 features (such as the the authentication feature mentioned above) wont be available.
#### Limitations
### Limitations
If you encounter any issues during the upgrade, you can attempt to restart the etcd process in trouble using a newer v2.1 binary to solve the problem. One known issue is that etcd v2.0.0 and v2.0.2 may panic during rolling upgrades due to an existing bug, which has been fixed since etcd v2.0.3.
@ -36,11 +36,11 @@ It might take up to 2 minutes for the newly upgraded member to catch up with the
If you have even more data, this might take more time. If you have a data size larger than 100MB you should contact us before upgrading, so we can make sure the upgrades work smoothly.
#### Downgrade
### Downgrade
If all members have been upgraded to v2.1, the cluster will be upgraded to v2.1, and downgrade is **not possible**. If any member is still v2.0, the cluster will remain in v2.0, and you can go back to use v2.0 binary.
Please [backup your data directory](admin_guide.md#backing-up-the-datastore) of all etcd members if you want to downgrade the cluster, even if it is upgraded.
Please [backup your data directory][backup-datastore] of all etcd members if you want to downgrade the cluster, even if it is upgraded.
### Upgrade Procedure
@ -70,7 +70,7 @@ You will see similar error logging from other etcd processes in your cluster. Th
2015/06/23 15:45:11 stream: stopping the stream server...
```
You could [backup your data directory](https://github.com/coreos/etcd/blob/7f7e2cc79d9c5c342a6eb1e48c386b0223cf934e/Documentation/admin_guide.md#backing-up-the-datastore) for data safety.
You could [backup your data directory][backup-datastore] for data safety.
```
$ etcdctl backup \
@ -110,3 +110,7 @@ When all members are upgraded, you will see the cluster is upgraded to 2.1 succe
$ curl http://127.0.0.1:4001/version
{"etcdserver":"2.1.x","etcdcluster":"2.1.0"}
```
[auth]: auth_api.md
[backup-datastore]: admin_guide.md#backing-up-the-datastore
[v2.0]: https://github.com/coreos/etcd/releases/tag/v2.0.13

View File

@ -0,0 +1,132 @@
# Upgrade etcd from 2.1 to 2.2
In the general case, upgrading from etcd 2.1 to 2.2 can be a zero-downtime, rolling upgrade:
- one by one, stop the etcd v2.1 processes and replace them with etcd v2.2 processes
- after you are running all v2.2 processes, new features in v2.2 are available to the cluster
Before [starting an upgrade](#upgrade-procedure), read through the rest of this guide to prepare.
## Upgrade Checklists
### Upgrade Requirement
To upgrade an existing etcd deployment to 2.2, you must be running 2.1. If youre running a version of etcd before 2.1, you must upgrade to [2.1][v2.1] before upgrading to 2.2.
Also, to ensure a smooth rolling upgrade, your running cluster must be healthy. You can check the health of the cluster by using `etcdctl cluster-health` command.
### Preparedness
Before upgrading etcd, always test the services relying on etcd in a staging environment before deploying the upgrade to the production environment.
You might also want to [backup the data directory][backup-datastore] for a potential [downgrade].
### Mixed Versions
While upgrading, an etcd cluster supports mixed versions of etcd members. The cluster is only considered upgraded once all its members are upgraded to 2.2.
Internally, etcd members negotiate with each other to determine the overall etcd cluster version, which controls the reported cluster version and the supported features.
### Limitations
If you have a data size larger than 100MB you should contact us before upgrading, so we can make sure the upgrades work smoothly.
Every etcd 2.2 member will do health checking across the cluster periodically. etcd 2.1 member does not support health checking. During the upgrade, etcd 2.2 member will log warning about the unhealthy state of etcd 2.1 member. You can ignore the warning.
### Downgrade
If all members have been upgraded to v2.2, the cluster will be upgraded to v2.2, and downgrade is **not possible**. If any member is still v2.1, the cluster will remain in v2.1, and you can go back to use v2.1 binary.
Please [backup the data directory][backup-datastore] of all etcd members if you want to downgrade the cluster, even if it is upgraded.
### Upgrade Procedure
In the example, we upgrade a three member v2.1 cluster running on local machine.
#### 1. Check upgrade requirements.
```
$ etcdctl cluster-health
member 6e3bd23ae5f1eae0 is healthy: got healthy result from http://localhost:22379
member 924e2e83e93f2560 is healthy: got healthy result from http://localhost:32379
member a8266ecf031671f3 is healthy: got healthy result from http://localhost:12379
cluster is healthy
$ curl http://localhost:4001/version
{"etcdserver":"2.1.x","etcdcluster":"2.1.0"}
```
#### 2. Stop the existing etcd process
You will see similar error logging from other etcd processes in your cluster. This is normal, since you just shut down a member and the connection is broken.
```
2015/09/2 09:48:35 etcdserver: failed to reach the peerURL(http://localhost:12380) of member a8266ecf031671f3 (Get http://localhost:12380/version: dial tcp [::1]:12380: getsockopt: connection refused)
2015/09/2 09:48:35 etcdserver: cannot get the version of member a8266ecf031671f3 (Get http://localhost:12380/version: dial tcp [::1]:12380: getsockopt: connection refused)
2015/09/2 09:48:35 rafthttp: failed to write a8266ecf031671f3 on stream Message (write tcp 127.0.0.1:32380->127.0.0.1:64394: write: broken pipe)
2015/09/2 09:48:35 rafthttp: failed to write a8266ecf031671f3 on pipeline (dial tcp [::1]:12380: getsockopt: connection refused)
2015/09/2 09:48:40 etcdserver: failed to reach the peerURL(http://localhost:7001) of member a8266ecf031671f3 (Get http://localhost:7001/version: dial tcp [::1]:12380: getsockopt: connection refused)
2015/09/2 09:48:40 etcdserver: cannot get the version of member a8266ecf031671f3 (Get http://localhost:12380/version: dial tcp [::1]:12380: getsockopt: connection refused)
2015/09/2 09:48:40 rafthttp: failed to heartbeat a8266ecf031671f3 on stream MsgApp v2 (write tcp 127.0.0.1:32380->127.0.0.1:64393: write: broken pipe)
```
You will see logging output like this from ungraded member due to a mixed version cluster. You can ignore this while upgrading.
```
2015/09/2 09:48:45 etcdserver: the etcd version 2.1.2+git is not up-to-date
2015/09/2 09:48:45 etcdserver: member a8266ecf031671f3 has a higher version &{2.2.0-rc.0+git 2.1.0}
```
You will also see logging output like this from the newly upgraded member, since etcd 2.1 member does not support health checking. You can ignore this while upgrading.
```
2015-09-02 09:55:42.691384 W | rafthttp: the connection to peer 6e3bd23ae5f1eae0 is unhealthy
2015-09-02 09:55:42.705626 W | rafthttp: the connection to peer 924e2e83e93f2560 is unhealthy
```
[Backup your data directory][backup-datastore] for data safety.
```
$ etcdctl backup \
--data-dir /var/lib/etcd \
--backup-dir /tmp/etcd_backup
```
#### 3. Drop-in etcd v2.2 binary and start the new etcd process
Now, you can start the etcd v2.2 binary with the previous configuration.
You will see the etcd start and publish its information to the cluster.
```
2015-09-02 09:56:46.117609 I | etcdserver: published {Name:infra2 ClientURLs:[http://localhost:22380]} to cluster e9c7614f68f35fb2
```
You could verify the cluster becomes healthy.
```
$ etcdctl cluster-health
member 6e3bd23ae5f1eae0 is healthy: got healthy result from http://localhost:22379
member 924e2e83e93f2560 is healthy: got healthy result from http://localhost:32379
member a8266ecf031671f3 is healthy: got healthy result from http://localhost:12379
cluster is healthy
```
#### 4. Repeat step 2 to step 3 for all other members
#### 5. Finish
When all members are upgraded, you will see the cluster is upgraded to 2.2 successfully:
```
2015-09-02 09:56:54.896848 N | etcdserver: updated the cluster version from 2.1 to 2.2
```
```
$ curl http://127.0.0.1:4001/version
{"etcdserver":"2.2.x","etcdcluster":"2.2.0"}
```
[backup-datastore]: admin_guide.md#backing-up-the-datastore
[downgrade]: #downgrade
[v2.1]: https://github.com/coreos/etcd/releases/tag/v2.1.2

View File

@ -0,0 +1,121 @@
## Upgrade etcd from 2.2 to 2.3
In the general case, upgrading from etcd 2.2 to 2.3 can be a zero-downtime, rolling upgrade:
- one by one, stop the etcd v2.2 processes and replace them with etcd v2.3 processes
- after running all v2.3 processes, new features in v2.3 are available to the cluster
Before [starting an upgrade](#upgrade-procedure), read through the rest of this guide to prepare.
### Upgrade Checklists
#### Upgrade Requirements
To upgrade an existing etcd deployment to 2.3, the running cluster must be 2.2 or greater. If it's before 2.2, please upgrade to [2.2](https://github.com/coreos/etcd/releases/tag/v2.2.0) before upgrading to 2.3.
Also, to ensure a smooth rolling upgrade, the running cluster must be healthy. You can check the health of the cluster by using the `etcdctl cluster-health` command.
#### Preparation
Before upgrading etcd, always test the services relying on etcd in a staging environment before deploying the upgrade to the production environment.
Before beginning, [backup the etcd data directory](admin_guide.md#backing-up-the-datastore). Should something go wrong with the upgrade, it is possible to use this backup to[downgrade](#downgrade) back to existing etcd version.
#### Mixed Versions
While upgrading, an etcd cluster supports mixed versions of etcd members, and operates with the protocol of the lowest common version. The cluster is only considered upgraded once all of its members are upgraded to version 2.3. Internally, etcd members negotiate with each other to determine the overall cluster version, which controls the reported version and the supported features.
#### Limitations
It might take up to 2 minutes for the newly upgraded member to catch up with the existing cluster when the total data size is larger than 50MB. Check the size of a recent snapshot to estimate the total data size. In other words, it is safest to wait for 2 minutes between upgrading each member.
For a much larger total data size, 100MB or more , this one-time process might take even more time. Administrators of very large etcd clusters of this magnitude can feel free to contact the [etcd team][etcd-contact] before upgrading, and well be happy to provide advice on the procedure.
#### Downgrade
If all members have been upgraded to v2.3, the cluster will be upgraded to v2.3, and downgrade from this completed state is **not possible**. If any single member is still v2.2, however, the cluster and its operations remains “v2.2”, and it is possible from this mixed cluster state to return to using a v2.2 etcd binary on all members.
Please [backup the data directory](admin_guide.md#backing-up-the-datastore) of all etcd members to make downgrading the cluster possible even after it has been completely upgraded.
### Upgrade Procedure
This example details the upgrade of a three-member v2.2 ectd cluster running on a local machine.
#### 1. Check upgrade requirements.
Is the the cluster healthy and running v.2.2.x?
```
$ etcdctl cluster-health
member 6e3bd23ae5f1eae0 is healthy: got healthy result from http://localhost:22379
member 924e2e83e93f2560 is healthy: got healthy result from http://localhost:32379
member a8266ecf031671f3 is healthy: got healthy result from http://localhost:12379
cluster is healthy
$ curl http://localhost:4001/version
{"etcdserver":"2.2.x","etcdcluster":"2.2.0"}
```
#### 2. Stop the existing etcd process
When each etcd process is stopped, expected errors will be logged by other cluster members. This is normal since a cluster member connection has been (temporarily) broken:
```
2016-03-11 09:50:49.860319 E | rafthttp: failed to read 8211f1d0f64f3269 on stream Message (unexpected EOF)
2016-03-11 09:50:49.860335 I | rafthttp: the connection with 8211f1d0f64f3269 became inactive
2016-03-11 09:50:51.023804 W | etcdserver: failed to reach the peerURL(http://127.0.0.1:12380) of member 8211f1d0f64f3269 (Get http://127.0.0.1:12380/version: dial tcp 127.0.0.1:12380: getsockopt: connection refused)
2016-03-11 09:50:51.023821 W | etcdserver: cannot get the version of member 8211f1d0f64f3269 (Get http://127.0.0.1:12380/version: dial tcp 127.0.0.1:12380: getsockopt: connection refused)
```
Its a good idea at this point to [backup the etcd data directory](https://github.com/coreos/etcd/blob/7f7e2cc79d9c5c342a6eb1e48c386b0223cf934e/Documentation/admin_guide.md#backing-up-the-datastore) to provide a downgrade path should any problems occur:
```
$ etcdctl backup \
--data-dir /var/lib/etcd \
--backup-dir /tmp/etcd_backup
```
#### 3. Drop-in etcd v2.3 binary and start the new etcd process
The new v2.3 etcd will publish its information to the cluster:
```
09:58:25.938673 I | etcdserver: published {Name:infra1 ClientURLs:[http://localhost:12379]} to cluster 524400597fb1d5f6
```
Verify that each member, and then the entire cluster, becomes healthy with the new v2.3 etcd binary:
```
$ etcdctl cluster-health
member 6e3bd23ae5f1eae0 is healthy: got healthy result from http://localhost:22379
member 924e2e83e93f2560 is healthy: got healthy result from http://localhost:32379
member a8266ecf031671f3 is healthy: got healthy result from http://localhost:12379
cluster is healthy
```
Upgraded members will log warnings like the following until the entire cluster is upgraded. This is expected and will cease after all etcd cluster members are upgraded to v2.3:
```
2016-03-11 09:58:26.851837 W | etcdserver: the local etcd version 2.2.0 is not up-to-date
2016-03-11 09:58:26.851854 W | etcdserver: member c02c70ede158499f has a higher version 2.3.0
```
#### 4. Repeat step 2 to step 3 for all other members
#### 5. Finish
When all members are upgraded, the cluster will report upgrading to 2.3 successfully:
```
2016-03-11 10:03:01.583392 N | etcdserver: updated the cluster version from 2.2 to 2.3
```
```
$ curl http://127.0.0.1:4001/version
{"etcdserver":"2.3.x","etcdcluster":"2.3.0"}
```
[etcd-contact]: https://coreos.com/etcd/?

145
Godeps/Godeps.json generated
View File

@ -1,6 +1,6 @@
{
"ImportPath": "github.com/coreos/etcd",
"GoVersion": "go1.4.1",
"GoVersion": "go1.5.1",
"Packages": [
"./..."
],
@ -10,44 +10,68 @@
"Comment": "null-5",
"Rev": "75cd24fc2f2c2a2088577d12123ddee5f54e0675"
},
{
"ImportPath": "github.com/akrennmair/gopcap",
"Rev": "00e11033259acb75598ba416495bb708d864a010"
},
{
"ImportPath": "github.com/beorn7/perks/quantile",
"Rev": "b965b613227fddccbfffe13eae360ed3fa822f8d"
},
{
"ImportPath": "github.com/bgentry/speakeasy",
"Rev": "5dfe43257d1f86b96484e760f2f0c4e2559089c7"
"Rev": "36e9cfdd690967f4f690c6edcc9ffacd006014a0"
},
{
"ImportPath": "github.com/boltdb/bolt",
"Comment": "v1.0-71-g71f28ea",
"Rev": "71f28eaecbebd00604d87bb1de0dae8fcfa54bbd"
"Comment": "v1.1.0-81-g0fd4c05",
"Rev": "0fd4c0547d204c7b1cad6db6f3adad5f2cf453e5"
},
{
"ImportPath": "github.com/bradfitz/http2",
"Rev": "3e36af6d3af0e56fa3da71099f864933dea3d9fb"
"ImportPath": "github.com/cheggaaa/pb",
"Rev": "da1f27ad1d9509b16f65f52fd9d8138b0f2dc7b2"
},
{
"ImportPath": "github.com/codegangsta/cli",
"Comment": "1.2.0-26-gf7ebb76",
"Rev": "f7ebb761e83e21225d1d8954fde853bf8edd46c4"
"Comment": "1.2.0-183-gb5232bb",
"Rev": "b5232bb2934f606f9f27a1305f1eea224e8e8b88"
},
{
"ImportPath": "github.com/coreos/go-etcd/etcd",
"Comment": "v2.0.0-13-g4cceaf7",
"Rev": "4cceaf7283b76f27c4a732b20730dcdb61053bf5"
"ImportPath": "github.com/coreos/gexpect",
"Rev": "5173270e159f5aa8fbc999dc7e3dcb50f4098a69"
},
{
"ImportPath": "github.com/coreos/go-semver/semver",
"Rev": "568e959cd89871e61434c1143528d9162da89ef2"
},
{
"ImportPath": "github.com/coreos/go-systemd/daemon",
"Comment": "v3-6-gcea488b",
"Rev": "cea488b4e6855fee89b6c22a811e3c5baca861b6"
},
{
"ImportPath": "github.com/coreos/go-systemd/journal",
"Comment": "v3-6-gcea488b",
"Rev": "cea488b4e6855fee89b6c22a811e3c5baca861b6"
},
{
"ImportPath": "github.com/coreos/go-systemd/util",
"Comment": "v3-6-gcea488b",
"Rev": "cea488b4e6855fee89b6c22a811e3c5baca861b6"
},
{
"ImportPath": "github.com/coreos/pkg/capnslog",
"Rev": "99f6e6b8f8ea30b0f82769c1411691c44a66d015"
"Rev": "2c77715c4df99b5420ffcae14ead08f52104065d"
},
{
"ImportPath": "github.com/cpuguy83/go-md2man/md2man",
"Comment": "v1.0.4",
"Rev": "71acacd42f85e5e82f70a55327789582a5200a90"
},
{
"ImportPath": "github.com/gogo/protobuf/proto",
"Rev": "64f27bf06efee53589314a6e5a4af34cdd85adf6"
"Comment": "v0.1-118-ge8904f5",
"Rev": "e8904f58e872a473a5b91bc9bf3377d223555263"
},
{
"ImportPath": "github.com/golang/glog",
@ -55,43 +79,88 @@
},
{
"ImportPath": "github.com/golang/protobuf/proto",
"Rev": "5677a0e3d5e89854c9974e1256839ee23f8233ca"
"Rev": "6aaa8d47701fa6cf07e914ec01fde3d4a1fe79c3"
},
{
"ImportPath": "github.com/google/btree",
"Rev": "cc6329d4279e3f025a53a83c397d2339b5705c45"
},
{
"ImportPath": "github.com/inconshreveable/mousetrap",
"Rev": "76626ae9c91c4f2a10f34cad8ce83ea42c93bb75"
},
{
"ImportPath": "github.com/jonboulle/clockwork",
"Rev": "72f9bd7c4e0c2a40055ab3d0f09654f730cce982"
},
{
"ImportPath": "github.com/kballard/go-shellquote",
"Rev": "d8ec1a69a250a17bb0e419c386eac1f3711dc142"
},
{
"ImportPath": "github.com/kr/pty",
"Comment": "release.r56-29-gf7ee69f",
"Rev": "f7ee69f31298ecbe5d2b349c711e2547a617d398"
},
{
"ImportPath": "github.com/mattn/go-runewidth",
"Comment": "travisish-46-gd6bea18",
"Rev": "d6bea18f789704b5f83375793155289da36a3c7f"
},
{
"ImportPath": "github.com/matttproud/golang_protobuf_extensions/pbutil",
"Rev": "fc2b8d3a73c4867e51861bbdd5ae3c1f0869dd6a"
},
{
"ImportPath": "github.com/prometheus/client_golang/model",
"Comment": "0.5.0-10-ga842dc1",
"Rev": "a842dc11e0621c34a71cab634d1d0190a59802a8"
"ImportPath": "github.com/olekukonko/tablewriter",
"Rev": "cca8bbc0798408af109aaaa239cbd2634846b340"
},
{
"ImportPath": "github.com/olekukonko/ts",
"Rev": "ecf753e7c962639ab5a1fb46f7da627d4c0a04b8"
},
{
"ImportPath": "github.com/prometheus/client_golang/prometheus",
"Comment": "0.5.0-10-ga842dc1",
"Rev": "a842dc11e0621c34a71cab634d1d0190a59802a8"
},
{
"ImportPath": "github.com/prometheus/client_golang/text",
"Comment": "0.5.0-10-ga842dc1",
"Rev": "a842dc11e0621c34a71cab634d1d0190a59802a8"
"Comment": "0.7.0-52-ge51041b",
"Rev": "e51041b3fa41cece0dca035740ba6411905be473"
},
{
"ImportPath": "github.com/prometheus/client_model/go",
"Comment": "model-0.0.2-12-gfa8ad6f",
"Rev": "fa8ad6fec33561be4280a8f0514318c79d7f6cb6"
},
{
"ImportPath": "github.com/prometheus/common/expfmt",
"Rev": "ffe929a3f4c4faeaa10f2b9535c2b1be3ad15650"
},
{
"ImportPath": "github.com/prometheus/common/model",
"Rev": "ffe929a3f4c4faeaa10f2b9535c2b1be3ad15650"
},
{
"ImportPath": "github.com/prometheus/procfs",
"Rev": "ee2372b58cee877abe07cde670d04d3b3bac5ee6"
"Rev": "454a56f35412459b5e684fd5ec0f9211b94f002a"
},
{
"ImportPath": "github.com/russross/blackfriday",
"Comment": "v1.4-2-g300106c",
"Rev": "300106c228d52c8941d4b3de6054a6062a86dda3"
},
{
"ImportPath": "github.com/shurcooL/sanitized_anchor_name",
"Rev": "10ef21a441db47d8b13ebcc5fd2310f636973c77"
},
{
"ImportPath": "github.com/spacejam/loghisto",
"Rev": "323309774dec8b7430187e46cd0793974ccca04a"
},
{
"ImportPath": "github.com/spf13/cobra",
"Rev": "1c44ec8d3f1552cac48999f9306da23c4d8a288b"
},
{
"ImportPath": "github.com/spf13/pflag",
"Rev": "08b1a584251b5b62f458943640fc8ebd4d50aaa5"
},
{
"ImportPath": "github.com/stretchr/testify/assert",
@ -99,7 +168,11 @@
},
{
"ImportPath": "github.com/ugorji/go/codec",
"Rev": "821cda7e48749cacf7cad2c6ed01e96457ca7e9d"
"Rev": "f1f1a805ed361a0e078bb537e4ea78cd37dcf065"
},
{
"ImportPath": "github.com/xiang90/probing",
"Rev": "6a0cc1ae81b4cc11db5e491e030e4b98fba79c19"
},
{
"ImportPath": "golang.org/x/crypto/bcrypt",
@ -111,23 +184,27 @@
},
{
"ImportPath": "golang.org/x/net/context",
"Rev": "7dbad50ab5b31073856416cdcfeb2796d682f844"
"Rev": "6acef71eb69611914f7a30939ea9f6e194c78172"
},
{
"ImportPath": "golang.org/x/oauth2",
"Rev": "3046bc76d6dfd7d3707f6640f85e42d9c4050f50"
"ImportPath": "golang.org/x/net/http2",
"Rev": "6acef71eb69611914f7a30939ea9f6e194c78172"
},
{
"ImportPath": "google.golang.org/cloud/compute/metadata",
"Rev": "f20d6dcccb44ed49de45ae3703312cb46e627db1"
"ImportPath": "golang.org/x/net/internal/timeseries",
"Rev": "6acef71eb69611914f7a30939ea9f6e194c78172"
},
{
"ImportPath": "google.golang.org/cloud/internal",
"Rev": "f20d6dcccb44ed49de45ae3703312cb46e627db1"
"ImportPath": "golang.org/x/net/trace",
"Rev": "6acef71eb69611914f7a30939ea9f6e194c78172"
},
{
"ImportPath": "golang.org/x/sys/unix",
"Rev": "9c60d1c508f5134d1ca726b4641db998f2523357"
},
{
"ImportPath": "google.golang.org/grpc",
"Rev": "f5ebd86be717593ab029545492c93ddf8914832b"
"Rev": "b88c12e7caf74af3928de99a864aaa9916fa5aad"
}
]
}

View File

@ -0,0 +1,5 @@
#*
*~
/tools/pass/pass
/tools/pcaptest/pcaptest
/tools/tcpdump/tcpdump

View File

@ -0,0 +1,27 @@
Copyright (c) 2009-2011 Andreas Krennmair. All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above
copyright notice, this list of conditions and the following disclaimer
in the documentation and/or other materials provided with the
distribution.
* Neither the name of Andreas Krennmair nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

View File

@ -0,0 +1,11 @@
# PCAP
This is a simple wrapper around libpcap for Go. Originally written by Andreas
Krennmair <ak@synflood.at> and only minorly touched up by Mark Smith <mark@qq.is>.
Please see the included pcaptest.go and tcpdump.go programs for instructions on
how to use this library.
Miek Gieben <miek@miek.nl> has created a more Go-like package and replaced functionality
with standard functions from the standard library. The package has also been renamed to
pcap.

View File

@ -0,0 +1,527 @@
package pcap
import (
"encoding/binary"
"fmt"
"net"
"reflect"
"strings"
)
const (
TYPE_IP = 0x0800
TYPE_ARP = 0x0806
TYPE_IP6 = 0x86DD
TYPE_VLAN = 0x8100
IP_ICMP = 1
IP_INIP = 4
IP_TCP = 6
IP_UDP = 17
)
const (
ERRBUF_SIZE = 256
// According to pcap-linktype(7).
LINKTYPE_NULL = 0
LINKTYPE_ETHERNET = 1
LINKTYPE_TOKEN_RING = 6
LINKTYPE_ARCNET = 7
LINKTYPE_SLIP = 8
LINKTYPE_PPP = 9
LINKTYPE_FDDI = 10
LINKTYPE_ATM_RFC1483 = 100
LINKTYPE_RAW = 101
LINKTYPE_PPP_HDLC = 50
LINKTYPE_PPP_ETHER = 51
LINKTYPE_C_HDLC = 104
LINKTYPE_IEEE802_11 = 105
LINKTYPE_FRELAY = 107
LINKTYPE_LOOP = 108
LINKTYPE_LINUX_SLL = 113
LINKTYPE_LTALK = 104
LINKTYPE_PFLOG = 117
LINKTYPE_PRISM_HEADER = 119
LINKTYPE_IP_OVER_FC = 122
LINKTYPE_SUNATM = 123
LINKTYPE_IEEE802_11_RADIO = 127
LINKTYPE_ARCNET_LINUX = 129
LINKTYPE_LINUX_IRDA = 144
LINKTYPE_LINUX_LAPD = 177
)
type addrHdr interface {
SrcAddr() string
DestAddr() string
Len() int
}
type addrStringer interface {
String(addr addrHdr) string
}
func decodemac(pkt []byte) uint64 {
mac := uint64(0)
for i := uint(0); i < 6; i++ {
mac = (mac << 8) + uint64(pkt[i])
}
return mac
}
// Decode decodes the headers of a Packet.
func (p *Packet) Decode() {
if len(p.Data) <= 14 {
return
}
p.Type = int(binary.BigEndian.Uint16(p.Data[12:14]))
p.DestMac = decodemac(p.Data[0:6])
p.SrcMac = decodemac(p.Data[6:12])
if len(p.Data) >= 15 {
p.Payload = p.Data[14:]
}
switch p.Type {
case TYPE_IP:
p.decodeIp()
case TYPE_IP6:
p.decodeIp6()
case TYPE_ARP:
p.decodeArp()
case TYPE_VLAN:
p.decodeVlan()
}
}
func (p *Packet) headerString(headers []interface{}) string {
// If there's just one header, return that.
if len(headers) == 1 {
if hdr, ok := headers[0].(fmt.Stringer); ok {
return hdr.String()
}
}
// If there are two headers (IPv4/IPv6 -> TCP/UDP/IP..)
if len(headers) == 2 {
// Commonly the first header is an address.
if addr, ok := p.Headers[0].(addrHdr); ok {
if hdr, ok := p.Headers[1].(addrStringer); ok {
return fmt.Sprintf("%s %s", p.Time, hdr.String(addr))
}
}
}
// For IP in IP, we do a recursive call.
if len(headers) >= 2 {
if addr, ok := headers[0].(addrHdr); ok {
if _, ok := headers[1].(addrHdr); ok {
return fmt.Sprintf("%s > %s IP in IP: ",
addr.SrcAddr(), addr.DestAddr(), p.headerString(headers[1:]))
}
}
}
var typeNames []string
for _, hdr := range headers {
typeNames = append(typeNames, reflect.TypeOf(hdr).String())
}
return fmt.Sprintf("unknown [%s]", strings.Join(typeNames, ","))
}
// String prints a one-line representation of the packet header.
// The output is suitable for use in a tcpdump program.
func (p *Packet) String() string {
// If there are no headers, print "unsupported protocol".
if len(p.Headers) == 0 {
return fmt.Sprintf("%s unsupported protocol %d", p.Time, int(p.Type))
}
return fmt.Sprintf("%s %s", p.Time, p.headerString(p.Headers))
}
// Arphdr is a ARP packet header.
type Arphdr struct {
Addrtype uint16
Protocol uint16
HwAddressSize uint8
ProtAddressSize uint8
Operation uint16
SourceHwAddress []byte
SourceProtAddress []byte
DestHwAddress []byte
DestProtAddress []byte
}
func (arp *Arphdr) String() (s string) {
switch arp.Operation {
case 1:
s = "ARP request"
case 2:
s = "ARP Reply"
}
if arp.Addrtype == LINKTYPE_ETHERNET && arp.Protocol == TYPE_IP {
s = fmt.Sprintf("%012x (%s) > %012x (%s)",
decodemac(arp.SourceHwAddress), arp.SourceProtAddress,
decodemac(arp.DestHwAddress), arp.DestProtAddress)
} else {
s = fmt.Sprintf("addrtype = %d protocol = %d", arp.Addrtype, arp.Protocol)
}
return
}
func (p *Packet) decodeArp() {
if len(p.Payload) < 8 {
return
}
pkt := p.Payload
arp := new(Arphdr)
arp.Addrtype = binary.BigEndian.Uint16(pkt[0:2])
arp.Protocol = binary.BigEndian.Uint16(pkt[2:4])
arp.HwAddressSize = pkt[4]
arp.ProtAddressSize = pkt[5]
arp.Operation = binary.BigEndian.Uint16(pkt[6:8])
if len(pkt) < int(8+2*arp.HwAddressSize+2*arp.ProtAddressSize) {
return
}
arp.SourceHwAddress = pkt[8 : 8+arp.HwAddressSize]
arp.SourceProtAddress = pkt[8+arp.HwAddressSize : 8+arp.HwAddressSize+arp.ProtAddressSize]
arp.DestHwAddress = pkt[8+arp.HwAddressSize+arp.ProtAddressSize : 8+2*arp.HwAddressSize+arp.ProtAddressSize]
arp.DestProtAddress = pkt[8+2*arp.HwAddressSize+arp.ProtAddressSize : 8+2*arp.HwAddressSize+2*arp.ProtAddressSize]
p.Headers = append(p.Headers, arp)
if len(pkt) >= int(8+2*arp.HwAddressSize+2*arp.ProtAddressSize) {
p.Payload = p.Payload[8+2*arp.HwAddressSize+2*arp.ProtAddressSize:]
}
}
// IPadr is the header of an IP packet.
type Iphdr struct {
Version uint8
Ihl uint8
Tos uint8
Length uint16
Id uint16
Flags uint8
FragOffset uint16
Ttl uint8
Protocol uint8
Checksum uint16
SrcIp []byte
DestIp []byte
}
func (p *Packet) decodeIp() {
if len(p.Payload) < 20 {
return
}
pkt := p.Payload
ip := new(Iphdr)
ip.Version = uint8(pkt[0]) >> 4
ip.Ihl = uint8(pkt[0]) & 0x0F
ip.Tos = pkt[1]
ip.Length = binary.BigEndian.Uint16(pkt[2:4])
ip.Id = binary.BigEndian.Uint16(pkt[4:6])
flagsfrags := binary.BigEndian.Uint16(pkt[6:8])
ip.Flags = uint8(flagsfrags >> 13)
ip.FragOffset = flagsfrags & 0x1FFF
ip.Ttl = pkt[8]
ip.Protocol = pkt[9]
ip.Checksum = binary.BigEndian.Uint16(pkt[10:12])
ip.SrcIp = pkt[12:16]
ip.DestIp = pkt[16:20]
pEnd := int(ip.Length)
if pEnd > len(pkt) {
pEnd = len(pkt)
}
if len(pkt) >= pEnd && int(ip.Ihl*4) < pEnd {
p.Payload = pkt[ip.Ihl*4 : pEnd]
} else {
p.Payload = []byte{}
}
p.Headers = append(p.Headers, ip)
p.IP = ip
switch ip.Protocol {
case IP_TCP:
p.decodeTcp()
case IP_UDP:
p.decodeUdp()
case IP_ICMP:
p.decodeIcmp()
case IP_INIP:
p.decodeIp()
}
}
func (ip *Iphdr) SrcAddr() string { return net.IP(ip.SrcIp).String() }
func (ip *Iphdr) DestAddr() string { return net.IP(ip.DestIp).String() }
func (ip *Iphdr) Len() int { return int(ip.Length) }
type Vlanhdr struct {
Priority byte
DropEligible bool
VlanIdentifier int
Type int // Not actually part of the vlan header, but the type of the actual packet
}
func (v *Vlanhdr) String() {
fmt.Sprintf("VLAN Priority:%d Drop:%v Tag:%d", v.Priority, v.DropEligible, v.VlanIdentifier)
}
func (p *Packet) decodeVlan() {
pkt := p.Payload
vlan := new(Vlanhdr)
if len(pkt) < 4 {
return
}
vlan.Priority = (pkt[2] & 0xE0) >> 13
vlan.DropEligible = pkt[2]&0x10 != 0
vlan.VlanIdentifier = int(binary.BigEndian.Uint16(pkt[:2])) & 0x0FFF
vlan.Type = int(binary.BigEndian.Uint16(p.Payload[2:4]))
p.Headers = append(p.Headers, vlan)
if len(pkt) >= 5 {
p.Payload = p.Payload[4:]
}
switch vlan.Type {
case TYPE_IP:
p.decodeIp()
case TYPE_IP6:
p.decodeIp6()
case TYPE_ARP:
p.decodeArp()
}
}
type Tcphdr struct {
SrcPort uint16
DestPort uint16
Seq uint32
Ack uint32
DataOffset uint8
Flags uint16
Window uint16
Checksum uint16
Urgent uint16
Data []byte
}
const (
TCP_FIN = 1 << iota
TCP_SYN
TCP_RST
TCP_PSH
TCP_ACK
TCP_URG
TCP_ECE
TCP_CWR
TCP_NS
)
func (p *Packet) decodeTcp() {
if len(p.Payload) < 20 {
return
}
pkt := p.Payload
tcp := new(Tcphdr)
tcp.SrcPort = binary.BigEndian.Uint16(pkt[0:2])
tcp.DestPort = binary.BigEndian.Uint16(pkt[2:4])
tcp.Seq = binary.BigEndian.Uint32(pkt[4:8])
tcp.Ack = binary.BigEndian.Uint32(pkt[8:12])
tcp.DataOffset = (pkt[12] & 0xF0) >> 4
tcp.Flags = binary.BigEndian.Uint16(pkt[12:14]) & 0x1FF
tcp.Window = binary.BigEndian.Uint16(pkt[14:16])
tcp.Checksum = binary.BigEndian.Uint16(pkt[16:18])
tcp.Urgent = binary.BigEndian.Uint16(pkt[18:20])
if len(pkt) >= int(tcp.DataOffset*4) {
p.Payload = pkt[tcp.DataOffset*4:]
}
p.Headers = append(p.Headers, tcp)
p.TCP = tcp
}
func (tcp *Tcphdr) String(hdr addrHdr) string {
return fmt.Sprintf("TCP %s:%d > %s:%d %s SEQ=%d ACK=%d LEN=%d",
hdr.SrcAddr(), int(tcp.SrcPort), hdr.DestAddr(), int(tcp.DestPort),
tcp.FlagsString(), int64(tcp.Seq), int64(tcp.Ack), hdr.Len())
}
func (tcp *Tcphdr) FlagsString() string {
var sflags []string
if 0 != (tcp.Flags & TCP_SYN) {
sflags = append(sflags, "syn")
}
if 0 != (tcp.Flags & TCP_FIN) {
sflags = append(sflags, "fin")
}
if 0 != (tcp.Flags & TCP_ACK) {
sflags = append(sflags, "ack")
}
if 0 != (tcp.Flags & TCP_PSH) {
sflags = append(sflags, "psh")
}
if 0 != (tcp.Flags & TCP_RST) {
sflags = append(sflags, "rst")
}
if 0 != (tcp.Flags & TCP_URG) {
sflags = append(sflags, "urg")
}
if 0 != (tcp.Flags & TCP_NS) {
sflags = append(sflags, "ns")
}
if 0 != (tcp.Flags & TCP_CWR) {
sflags = append(sflags, "cwr")
}
if 0 != (tcp.Flags & TCP_ECE) {
sflags = append(sflags, "ece")
}
return fmt.Sprintf("[%s]", strings.Join(sflags, " "))
}
type Udphdr struct {
SrcPort uint16
DestPort uint16
Length uint16
Checksum uint16
}
func (p *Packet) decodeUdp() {
if len(p.Payload) < 8 {
return
}
pkt := p.Payload
udp := new(Udphdr)
udp.SrcPort = binary.BigEndian.Uint16(pkt[0:2])
udp.DestPort = binary.BigEndian.Uint16(pkt[2:4])
udp.Length = binary.BigEndian.Uint16(pkt[4:6])
udp.Checksum = binary.BigEndian.Uint16(pkt[6:8])
p.Headers = append(p.Headers, udp)
p.UDP = udp
if len(p.Payload) >= 8 {
p.Payload = pkt[8:]
}
}
func (udp *Udphdr) String(hdr addrHdr) string {
return fmt.Sprintf("UDP %s:%d > %s:%d LEN=%d CHKSUM=%d",
hdr.SrcAddr(), int(udp.SrcPort), hdr.DestAddr(), int(udp.DestPort),
int(udp.Length), int(udp.Checksum))
}
type Icmphdr struct {
Type uint8
Code uint8
Checksum uint16
Id uint16
Seq uint16
Data []byte
}
func (p *Packet) decodeIcmp() *Icmphdr {
if len(p.Payload) < 8 {
return nil
}
pkt := p.Payload
icmp := new(Icmphdr)
icmp.Type = pkt[0]
icmp.Code = pkt[1]
icmp.Checksum = binary.BigEndian.Uint16(pkt[2:4])
icmp.Id = binary.BigEndian.Uint16(pkt[4:6])
icmp.Seq = binary.BigEndian.Uint16(pkt[6:8])
p.Payload = pkt[8:]
p.Headers = append(p.Headers, icmp)
return icmp
}
func (icmp *Icmphdr) String(hdr addrHdr) string {
return fmt.Sprintf("ICMP %s > %s Type = %d Code = %d ",
hdr.SrcAddr(), hdr.DestAddr(), icmp.Type, icmp.Code)
}
func (icmp *Icmphdr) TypeString() (result string) {
switch icmp.Type {
case 0:
result = fmt.Sprintf("Echo reply seq=%d", icmp.Seq)
case 3:
switch icmp.Code {
case 0:
result = "Network unreachable"
case 1:
result = "Host unreachable"
case 2:
result = "Protocol unreachable"
case 3:
result = "Port unreachable"
default:
result = "Destination unreachable"
}
case 8:
result = fmt.Sprintf("Echo request seq=%d", icmp.Seq)
case 30:
result = "Traceroute"
}
return
}
type Ip6hdr struct {
// http://www.networksorcery.com/enp/protocol/ipv6.htm
Version uint8 // 4 bits
TrafficClass uint8 // 8 bits
FlowLabel uint32 // 20 bits
Length uint16 // 16 bits
NextHeader uint8 // 8 bits, same as Protocol in Iphdr
HopLimit uint8 // 8 bits
SrcIp []byte // 16 bytes
DestIp []byte // 16 bytes
}
func (p *Packet) decodeIp6() {
if len(p.Payload) < 40 {
return
}
pkt := p.Payload
ip6 := new(Ip6hdr)
ip6.Version = uint8(pkt[0]) >> 4
ip6.TrafficClass = uint8((binary.BigEndian.Uint16(pkt[0:2]) >> 4) & 0x00FF)
ip6.FlowLabel = binary.BigEndian.Uint32(pkt[0:4]) & 0x000FFFFF
ip6.Length = binary.BigEndian.Uint16(pkt[4:6])
ip6.NextHeader = pkt[6]
ip6.HopLimit = pkt[7]
ip6.SrcIp = pkt[8:24]
ip6.DestIp = pkt[24:40]
if len(p.Payload) >= 40 {
p.Payload = pkt[40:]
}
p.Headers = append(p.Headers, ip6)
switch ip6.NextHeader {
case IP_TCP:
p.decodeTcp()
case IP_UDP:
p.decodeUdp()
case IP_ICMP:
p.decodeIcmp()
case IP_INIP:
p.decodeIp()
}
}
func (ip6 *Ip6hdr) SrcAddr() string { return net.IP(ip6.SrcIp).String() }
func (ip6 *Ip6hdr) DestAddr() string { return net.IP(ip6.DestIp).String() }
func (ip6 *Ip6hdr) Len() int { return int(ip6.Length) }

View File

@ -0,0 +1,247 @@
package pcap
import (
"bytes"
"testing"
"time"
)
var testSimpleTcpPacket *Packet = &Packet{
Data: []byte{
0x00, 0x00, 0x0c, 0x9f, 0xf0, 0x20, 0xbc, 0x30, 0x5b, 0xe8, 0xd3, 0x49,
0x08, 0x00, 0x45, 0x00, 0x01, 0xa4, 0x39, 0xdf, 0x40, 0x00, 0x40, 0x06,
0x55, 0x5a, 0xac, 0x11, 0x51, 0x49, 0xad, 0xde, 0xfe, 0xe1, 0xc5, 0xf7,
0x00, 0x50, 0xc5, 0x7e, 0x0e, 0x48, 0x49, 0x07, 0x42, 0x32, 0x80, 0x18,
0x00, 0x73, 0xab, 0xb1, 0x00, 0x00, 0x01, 0x01, 0x08, 0x0a, 0x03, 0x77,
0x37, 0x9c, 0x42, 0x77, 0x5e, 0x3a, 0x47, 0x45, 0x54, 0x20, 0x2f, 0x20,
0x48, 0x54, 0x54, 0x50, 0x2f, 0x31, 0x2e, 0x31, 0x0d, 0x0a, 0x48, 0x6f,
0x73, 0x74, 0x3a, 0x20, 0x77, 0x77, 0x77, 0x2e, 0x66, 0x69, 0x73, 0x68,
0x2e, 0x63, 0x6f, 0x6d, 0x0d, 0x0a, 0x43, 0x6f, 0x6e, 0x6e, 0x65, 0x63,
0x74, 0x69, 0x6f, 0x6e, 0x3a, 0x20, 0x6b, 0x65, 0x65, 0x70, 0x2d, 0x61,
0x6c, 0x69, 0x76, 0x65, 0x0d, 0x0a, 0x55, 0x73, 0x65, 0x72, 0x2d, 0x41,
0x67, 0x65, 0x6e, 0x74, 0x3a, 0x20, 0x4d, 0x6f, 0x7a, 0x69, 0x6c, 0x6c,
0x61, 0x2f, 0x35, 0x2e, 0x30, 0x20, 0x28, 0x58, 0x31, 0x31, 0x3b, 0x20,
0x4c, 0x69, 0x6e, 0x75, 0x78, 0x20, 0x78, 0x38, 0x36, 0x5f, 0x36, 0x34,
0x29, 0x20, 0x41, 0x70, 0x70, 0x6c, 0x65, 0x57, 0x65, 0x62, 0x4b, 0x69,
0x74, 0x2f, 0x35, 0x33, 0x35, 0x2e, 0x32, 0x20, 0x28, 0x4b, 0x48, 0x54,
0x4d, 0x4c, 0x2c, 0x20, 0x6c, 0x69, 0x6b, 0x65, 0x20, 0x47, 0x65, 0x63,
0x6b, 0x6f, 0x29, 0x20, 0x43, 0x68, 0x72, 0x6f, 0x6d, 0x65, 0x2f, 0x31,
0x35, 0x2e, 0x30, 0x2e, 0x38, 0x37, 0x34, 0x2e, 0x31, 0x32, 0x31, 0x20,
0x53, 0x61, 0x66, 0x61, 0x72, 0x69, 0x2f, 0x35, 0x33, 0x35, 0x2e, 0x32,
0x0d, 0x0a, 0x41, 0x63, 0x63, 0x65, 0x70, 0x74, 0x3a, 0x20, 0x74, 0x65,
0x78, 0x74, 0x2f, 0x68, 0x74, 0x6d, 0x6c, 0x2c, 0x61, 0x70, 0x70, 0x6c,
0x69, 0x63, 0x61, 0x74, 0x69, 0x6f, 0x6e, 0x2f, 0x78, 0x68, 0x74, 0x6d,
0x6c, 0x2b, 0x78, 0x6d, 0x6c, 0x2c, 0x61, 0x70, 0x70, 0x6c, 0x69, 0x63,
0x61, 0x74, 0x69, 0x6f, 0x6e, 0x2f, 0x78, 0x6d, 0x6c, 0x3b, 0x71, 0x3d,
0x30, 0x2e, 0x39, 0x2c, 0x2a, 0x2f, 0x2a, 0x3b, 0x71, 0x3d, 0x30, 0x2e,
0x38, 0x0d, 0x0a, 0x41, 0x63, 0x63, 0x65, 0x70, 0x74, 0x2d, 0x45, 0x6e,
0x63, 0x6f, 0x64, 0x69, 0x6e, 0x67, 0x3a, 0x20, 0x67, 0x7a, 0x69, 0x70,
0x2c, 0x64, 0x65, 0x66, 0x6c, 0x61, 0x74, 0x65, 0x2c, 0x73, 0x64, 0x63,
0x68, 0x0d, 0x0a, 0x41, 0x63, 0x63, 0x65, 0x70, 0x74, 0x2d, 0x4c, 0x61,
0x6e, 0x67, 0x75, 0x61, 0x67, 0x65, 0x3a, 0x20, 0x65, 0x6e, 0x2d, 0x55,
0x53, 0x2c, 0x65, 0x6e, 0x3b, 0x71, 0x3d, 0x30, 0x2e, 0x38, 0x0d, 0x0a,
0x41, 0x63, 0x63, 0x65, 0x70, 0x74, 0x2d, 0x43, 0x68, 0x61, 0x72, 0x73,
0x65, 0x74, 0x3a, 0x20, 0x49, 0x53, 0x4f, 0x2d, 0x38, 0x38, 0x35, 0x39,
0x2d, 0x31, 0x2c, 0x75, 0x74, 0x66, 0x2d, 0x38, 0x3b, 0x71, 0x3d, 0x30,
0x2e, 0x37, 0x2c, 0x2a, 0x3b, 0x71, 0x3d, 0x30, 0x2e, 0x33, 0x0d, 0x0a,
0x0d, 0x0a,
}}
func BenchmarkDecodeSimpleTcpPacket(b *testing.B) {
for i := 0; i < b.N; i++ {
testSimpleTcpPacket.Decode()
}
}
func TestDecodeSimpleTcpPacket(t *testing.T) {
p := testSimpleTcpPacket
p.Decode()
if p.DestMac != 0x00000c9ff020 {
t.Error("Dest mac", p.DestMac)
}
if p.SrcMac != 0xbc305be8d349 {
t.Error("Src mac", p.SrcMac)
}
if len(p.Headers) != 2 {
t.Error("Incorrect number of headers", len(p.Headers))
return
}
if ip, ipOk := p.Headers[0].(*Iphdr); ipOk {
if ip.Version != 4 {
t.Error("ip Version", ip.Version)
}
if ip.Ihl != 5 {
t.Error("ip header length", ip.Ihl)
}
if ip.Tos != 0 {
t.Error("ip TOS", ip.Tos)
}
if ip.Length != 420 {
t.Error("ip Length", ip.Length)
}
if ip.Id != 14815 {
t.Error("ip ID", ip.Id)
}
if ip.Flags != 0x02 {
t.Error("ip Flags", ip.Flags)
}
if ip.FragOffset != 0 {
t.Error("ip Fragoffset", ip.FragOffset)
}
if ip.Ttl != 64 {
t.Error("ip TTL", ip.Ttl)
}
if ip.Protocol != 6 {
t.Error("ip Protocol", ip.Protocol)
}
if ip.Checksum != 0x555A {
t.Error("ip Checksum", ip.Checksum)
}
if !bytes.Equal(ip.SrcIp, []byte{172, 17, 81, 73}) {
t.Error("ip Src", ip.SrcIp)
}
if !bytes.Equal(ip.DestIp, []byte{173, 222, 254, 225}) {
t.Error("ip Dest", ip.DestIp)
}
if tcp, tcpOk := p.Headers[1].(*Tcphdr); tcpOk {
if tcp.SrcPort != 50679 {
t.Error("tcp srcport", tcp.SrcPort)
}
if tcp.DestPort != 80 {
t.Error("tcp destport", tcp.DestPort)
}
if tcp.Seq != 0xc57e0e48 {
t.Error("tcp seq", tcp.Seq)
}
if tcp.Ack != 0x49074232 {
t.Error("tcp ack", tcp.Ack)
}
if tcp.DataOffset != 8 {
t.Error("tcp dataoffset", tcp.DataOffset)
}
if tcp.Flags != 0x18 {
t.Error("tcp flags", tcp.Flags)
}
if tcp.Window != 0x73 {
t.Error("tcp window", tcp.Window)
}
if tcp.Checksum != 0xabb1 {
t.Error("tcp checksum", tcp.Checksum)
}
if tcp.Urgent != 0 {
t.Error("tcp urgent", tcp.Urgent)
}
} else {
t.Error("Second header is not TCP header")
}
} else {
t.Error("First header is not IP header")
}
if string(p.Payload) != "GET / HTTP/1.1\r\nHost: www.fish.com\r\nConnection: keep-alive\r\nUser-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.2 (KHTML, like Gecko) Chrome/15.0.874.121 Safari/535.2\r\nAccept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\nAccept-Encoding: gzip,deflate,sdch\r\nAccept-Language: en-US,en;q=0.8\r\nAccept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3\r\n\r\n" {
t.Error("--- PAYLOAD STRING ---\n", string(p.Payload), "\n--- PAYLOAD BYTES ---\n", p.Payload)
}
}
// Makes sure packet payload doesn't display the 6 trailing null of this packet
// as part of the payload. They're actually the ethernet trailer.
func TestDecodeSmallTcpPacketHasEmptyPayload(t *testing.T) {
p := &Packet{
// This packet is only 54 bits (an empty TCP RST), thus 6 trailing null
// bytes are added by the ethernet layer to make it the minimum packet size.
Data: []byte{
0xbc, 0x30, 0x5b, 0xe8, 0xd3, 0x49, 0xb8, 0xac, 0x6f, 0x92, 0xd5, 0xbf,
0x08, 0x00, 0x45, 0x00, 0x00, 0x28, 0x00, 0x00, 0x40, 0x00, 0x40, 0x06,
0x3f, 0x9f, 0xac, 0x11, 0x51, 0xc5, 0xac, 0x11, 0x51, 0x49, 0x00, 0x63,
0x9a, 0xef, 0x00, 0x00, 0x00, 0x00, 0x2e, 0xc1, 0x27, 0x83, 0x50, 0x14,
0x00, 0x00, 0xc3, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
}}
p.Decode()
if p.Payload == nil {
t.Error("Nil payload")
}
if len(p.Payload) != 0 {
t.Error("Non-empty payload:", p.Payload)
}
}
func TestDecodeVlanPacket(t *testing.T) {
p := &Packet{
Data: []byte{
0x00, 0x10, 0xdb, 0xff, 0x10, 0x00, 0x00, 0x15, 0x2c, 0x9d, 0xcc, 0x00, 0x81, 0x00, 0x01, 0xf7,
0x08, 0x00, 0x45, 0x00, 0x00, 0x28, 0x29, 0x8d, 0x40, 0x00, 0x7d, 0x06, 0x83, 0xa0, 0xac, 0x1b,
0xca, 0x8e, 0x45, 0x16, 0x94, 0xe2, 0xd4, 0x0a, 0x00, 0x50, 0xdf, 0xab, 0x9c, 0xc6, 0xcd, 0x1e,
0xe5, 0xd1, 0x50, 0x10, 0x01, 0x00, 0x5a, 0x74, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
}}
p.Decode()
if p.Type != TYPE_VLAN {
t.Error("Didn't detect vlan")
}
if len(p.Headers) != 3 {
t.Error("Incorrect number of headers:", len(p.Headers))
for i, h := range p.Headers {
t.Errorf("Header %d: %#v", i, h)
}
t.FailNow()
}
if _, ok := p.Headers[0].(*Vlanhdr); !ok {
t.Errorf("First header isn't vlan: %q", p.Headers[0])
}
if _, ok := p.Headers[1].(*Iphdr); !ok {
t.Errorf("Second header isn't IP: %q", p.Headers[1])
}
if _, ok := p.Headers[2].(*Tcphdr); !ok {
t.Errorf("Third header isn't TCP: %q", p.Headers[2])
}
}
func TestDecodeFuzzFallout(t *testing.T) {
testData := []struct {
Data []byte
}{
{[]byte("000000000000\x81\x000")},
{[]byte("000000000000\x81\x00000")},
{[]byte("000000000000\x86\xdd0")},
{[]byte("000000000000\b\x000")},
{[]byte("000000000000\b\x060")},
{[]byte{}},
{[]byte("000000000000\b\x0600000000")},
{[]byte("000000000000\x86\xdd000000\x01000000000000000000000000000000000")},
{[]byte("000000000000\x81\x0000\b\x0600000000")},
{[]byte("000000000000\b\x00n0000000000000000000")},
{[]byte("000000000000\x86\xdd000000\x0100000000000000000000000000000000000")},
{[]byte("000000000000\x81\x0000\b\x00g0000000000000000000")},
//{[]byte()},
{[]byte("000000000000\b\x00400000000\x110000000000")},
{[]byte("0nMء\xfe\x13\x13\x81\x00gr\b\x00&x\xc9\xe5b'\x1e0\x00\x04\x00\x0020596224")},
{[]byte("000000000000\x81\x0000\b\x00400000000\x110000000000")},
{[]byte("000000000000\b\x00000000000\x0600\xff0000000")},
{[]byte("000000000000\x86\xdd000000\x06000000000000000000000000000000000")},
{[]byte("000000000000\x81\x0000\b\x00000000000\x0600b0000000")},
{[]byte("000000000000\x81\x0000\b\x00400000000\x060000000000")},
{[]byte("000000000000\x86\xdd000000\x11000000000000000000000000000000000")},
{[]byte("000000000000\x86\xdd000000\x0600000000000000000000000000000000000000000000M")},
{[]byte("000000000000\b\x00500000000\x0600000000000")},
{[]byte("0nM\xd80\xfe\x13\x13\x81\x00gr\b\x00&x\xc9\xe5b'\x1e0\x00\x04\x00\x0020596224")},
}
for _, entry := range testData {
pkt := &Packet{
Time: time.Now(),
Caplen: uint32(len(entry.Data)),
Len: uint32(len(entry.Data)),
Data: entry.Data,
}
pkt.Decode()
/*
func() {
defer func() {
if err := recover(); err != nil {
t.Fatalf("%d. %q failed: %v", idx, string(entry.Data), err)
}
}()
pkt.Decode()
}()
*/
}
}

View File

@ -0,0 +1,206 @@
package pcap
import (
"encoding/binary"
"fmt"
"io"
"time"
)
// FileHeader is the parsed header of a pcap file.
// http://wiki.wireshark.org/Development/LibpcapFileFormat
type FileHeader struct {
MagicNumber uint32
VersionMajor uint16
VersionMinor uint16
TimeZone int32
SigFigs uint32
SnapLen uint32
Network uint32
}
type PacketTime struct {
Sec int32
Usec int32
}
// Convert the PacketTime to a go Time struct.
func (p *PacketTime) Time() time.Time {
return time.Unix(int64(p.Sec), int64(p.Usec)*1000)
}
// Packet is a single packet parsed from a pcap file.
//
// Convenient access to IP, TCP, and UDP headers is provided after Decode()
// is called if the packet is of the appropriate type.
type Packet struct {
Time time.Time // packet send/receive time
Caplen uint32 // bytes stored in the file (caplen <= len)
Len uint32 // bytes sent/received
Data []byte // packet data
Type int // protocol type, see LINKTYPE_*
DestMac uint64
SrcMac uint64
Headers []interface{} // decoded headers, in order
Payload []byte // remaining non-header bytes
IP *Iphdr // IP header (for IP packets, after decoding)
TCP *Tcphdr // TCP header (for TCP packets, after decoding)
UDP *Udphdr // UDP header (for UDP packets after decoding)
}
// Reader parses pcap files.
type Reader struct {
flip bool
buf io.Reader
err error
fourBytes []byte
twoBytes []byte
sixteenBytes []byte
Header FileHeader
}
// NewReader reads pcap data from an io.Reader.
func NewReader(reader io.Reader) (*Reader, error) {
r := &Reader{
buf: reader,
fourBytes: make([]byte, 4),
twoBytes: make([]byte, 2),
sixteenBytes: make([]byte, 16),
}
switch magic := r.readUint32(); magic {
case 0xa1b2c3d4:
r.flip = false
case 0xd4c3b2a1:
r.flip = true
default:
return nil, fmt.Errorf("pcap: bad magic number: %0x", magic)
}
r.Header = FileHeader{
MagicNumber: 0xa1b2c3d4,
VersionMajor: r.readUint16(),
VersionMinor: r.readUint16(),
TimeZone: r.readInt32(),
SigFigs: r.readUint32(),
SnapLen: r.readUint32(),
Network: r.readUint32(),
}
return r, nil
}
// Next returns the next packet or nil if no more packets can be read.
func (r *Reader) Next() *Packet {
d := r.sixteenBytes
r.err = r.read(d)
if r.err != nil {
return nil
}
timeSec := asUint32(d[0:4], r.flip)
timeUsec := asUint32(d[4:8], r.flip)
capLen := asUint32(d[8:12], r.flip)
origLen := asUint32(d[12:16], r.flip)
data := make([]byte, capLen)
if r.err = r.read(data); r.err != nil {
return nil
}
return &Packet{
Time: time.Unix(int64(timeSec), int64(timeUsec)),
Caplen: capLen,
Len: origLen,
Data: data,
}
}
func (r *Reader) read(data []byte) error {
var err error
n, err := r.buf.Read(data)
for err == nil && n != len(data) {
var chunk int
chunk, err = r.buf.Read(data[n:])
n += chunk
}
if len(data) == n {
return nil
}
return err
}
func (r *Reader) readUint32() uint32 {
data := r.fourBytes
if r.err = r.read(data); r.err != nil {
return 0
}
return asUint32(data, r.flip)
}
func (r *Reader) readInt32() int32 {
data := r.fourBytes
if r.err = r.read(data); r.err != nil {
return 0
}
return int32(asUint32(data, r.flip))
}
func (r *Reader) readUint16() uint16 {
data := r.twoBytes
if r.err = r.read(data); r.err != nil {
return 0
}
return asUint16(data, r.flip)
}
// Writer writes a pcap file.
type Writer struct {
writer io.Writer
buf []byte
}
// NewWriter creates a Writer that stores output in an io.Writer.
// The FileHeader is written immediately.
func NewWriter(writer io.Writer, header *FileHeader) (*Writer, error) {
w := &Writer{
writer: writer,
buf: make([]byte, 24),
}
binary.LittleEndian.PutUint32(w.buf, header.MagicNumber)
binary.LittleEndian.PutUint16(w.buf[4:], header.VersionMajor)
binary.LittleEndian.PutUint16(w.buf[6:], header.VersionMinor)
binary.LittleEndian.PutUint32(w.buf[8:], uint32(header.TimeZone))
binary.LittleEndian.PutUint32(w.buf[12:], header.SigFigs)
binary.LittleEndian.PutUint32(w.buf[16:], header.SnapLen)
binary.LittleEndian.PutUint32(w.buf[20:], header.Network)
if _, err := writer.Write(w.buf); err != nil {
return nil, err
}
return w, nil
}
// Writer writes a packet to the underlying writer.
func (w *Writer) Write(pkt *Packet) error {
binary.LittleEndian.PutUint32(w.buf, uint32(pkt.Time.Unix()))
binary.LittleEndian.PutUint32(w.buf[4:], uint32(pkt.Time.Nanosecond()))
binary.LittleEndian.PutUint32(w.buf[8:], uint32(pkt.Time.Unix()))
binary.LittleEndian.PutUint32(w.buf[12:], pkt.Len)
if _, err := w.writer.Write(w.buf[:16]); err != nil {
return err
}
_, err := w.writer.Write(pkt.Data)
return err
}
func asUint32(data []byte, flip bool) uint32 {
if flip {
return binary.BigEndian.Uint32(data)
}
return binary.LittleEndian.Uint32(data)
}
func asUint16(data []byte, flip bool) uint16 {
if flip {
return binary.BigEndian.Uint16(data)
}
return binary.LittleEndian.Uint16(data)
}

View File

@ -0,0 +1,266 @@
// Interface to both live and offline pcap parsing.
package pcap
/*
#cgo linux LDFLAGS: -lpcap
#cgo freebsd LDFLAGS: -lpcap
#cgo darwin LDFLAGS: -lpcap
#cgo windows CFLAGS: -I C:/WpdPack/Include
#cgo windows,386 LDFLAGS: -L C:/WpdPack/Lib -lwpcap
#cgo windows,amd64 LDFLAGS: -L C:/WpdPack/Lib/x64 -lwpcap
#include <stdlib.h>
#include <pcap.h>
// Workaround for not knowing how to cast to const u_char**
int hack_pcap_next_ex(pcap_t *p, struct pcap_pkthdr **pkt_header,
u_char **pkt_data) {
return pcap_next_ex(p, pkt_header, (const u_char **)pkt_data);
}
*/
import "C"
import (
"errors"
"net"
"syscall"
"time"
"unsafe"
)
type Pcap struct {
cptr *C.pcap_t
}
type Stat struct {
PacketsReceived uint32
PacketsDropped uint32
PacketsIfDropped uint32
}
type Interface struct {
Name string
Description string
Addresses []IFAddress
// TODO: add more elements
}
type IFAddress struct {
IP net.IP
Netmask net.IPMask
// TODO: add broadcast + PtP dst ?
}
func (p *Pcap) Next() (pkt *Packet) {
rv, _ := p.NextEx()
return rv
}
// Openlive opens a device and returns a *Pcap handler
func Openlive(device string, snaplen int32, promisc bool, timeout_ms int32) (handle *Pcap, err error) {
var buf *C.char
buf = (*C.char)(C.calloc(ERRBUF_SIZE, 1))
h := new(Pcap)
var pro int32
if promisc {
pro = 1
}
dev := C.CString(device)
defer C.free(unsafe.Pointer(dev))
h.cptr = C.pcap_open_live(dev, C.int(snaplen), C.int(pro), C.int(timeout_ms), buf)
if nil == h.cptr {
handle = nil
err = errors.New(C.GoString(buf))
} else {
handle = h
}
C.free(unsafe.Pointer(buf))
return
}
func Openoffline(file string) (handle *Pcap, err error) {
var buf *C.char
buf = (*C.char)(C.calloc(ERRBUF_SIZE, 1))
h := new(Pcap)
cf := C.CString(file)
defer C.free(unsafe.Pointer(cf))
h.cptr = C.pcap_open_offline(cf, buf)
if nil == h.cptr {
handle = nil
err = errors.New(C.GoString(buf))
} else {
handle = h
}
C.free(unsafe.Pointer(buf))
return
}
func (p *Pcap) NextEx() (pkt *Packet, result int32) {
var pkthdr *C.struct_pcap_pkthdr
var buf_ptr *C.u_char
var buf unsafe.Pointer
result = int32(C.hack_pcap_next_ex(p.cptr, &pkthdr, &buf_ptr))
buf = unsafe.Pointer(buf_ptr)
if nil == buf {
return
}
pkt = new(Packet)
pkt.Time = time.Unix(int64(pkthdr.ts.tv_sec), int64(pkthdr.ts.tv_usec)*1000)
pkt.Caplen = uint32(pkthdr.caplen)
pkt.Len = uint32(pkthdr.len)
pkt.Data = C.GoBytes(buf, C.int(pkthdr.caplen))
return
}
func (p *Pcap) Close() {
C.pcap_close(p.cptr)
}
func (p *Pcap) Geterror() error {
return errors.New(C.GoString(C.pcap_geterr(p.cptr)))
}
func (p *Pcap) Getstats() (stat *Stat, err error) {
var cstats _Ctype_struct_pcap_stat
if -1 == C.pcap_stats(p.cptr, &cstats) {
return nil, p.Geterror()
}
stats := new(Stat)
stats.PacketsReceived = uint32(cstats.ps_recv)
stats.PacketsDropped = uint32(cstats.ps_drop)
stats.PacketsIfDropped = uint32(cstats.ps_ifdrop)
return stats, nil
}
func (p *Pcap) Setfilter(expr string) (err error) {
var bpf _Ctype_struct_bpf_program
cexpr := C.CString(expr)
defer C.free(unsafe.Pointer(cexpr))
if -1 == C.pcap_compile(p.cptr, &bpf, cexpr, 1, 0) {
return p.Geterror()
}
if -1 == C.pcap_setfilter(p.cptr, &bpf) {
C.pcap_freecode(&bpf)
return p.Geterror()
}
C.pcap_freecode(&bpf)
return nil
}
func Version() string {
return C.GoString(C.pcap_lib_version())
}
func (p *Pcap) Datalink() int {
return int(C.pcap_datalink(p.cptr))
}
func (p *Pcap) Setdatalink(dlt int) error {
if -1 == C.pcap_set_datalink(p.cptr, C.int(dlt)) {
return p.Geterror()
}
return nil
}
func DatalinkValueToName(dlt int) string {
if name := C.pcap_datalink_val_to_name(C.int(dlt)); name != nil {
return C.GoString(name)
}
return ""
}
func DatalinkValueToDescription(dlt int) string {
if desc := C.pcap_datalink_val_to_description(C.int(dlt)); desc != nil {
return C.GoString(desc)
}
return ""
}
func Findalldevs() (ifs []Interface, err error) {
var buf *C.char
buf = (*C.char)(C.calloc(ERRBUF_SIZE, 1))
defer C.free(unsafe.Pointer(buf))
var alldevsp *C.pcap_if_t
if -1 == C.pcap_findalldevs((**C.pcap_if_t)(&alldevsp), buf) {
return nil, errors.New(C.GoString(buf))
}
defer C.pcap_freealldevs((*C.pcap_if_t)(alldevsp))
dev := alldevsp
var i uint32
for i = 0; dev != nil; dev = (*C.pcap_if_t)(dev.next) {
i++
}
ifs = make([]Interface, i)
dev = alldevsp
for j := uint32(0); dev != nil; dev = (*C.pcap_if_t)(dev.next) {
var iface Interface
iface.Name = C.GoString(dev.name)
iface.Description = C.GoString(dev.description)
iface.Addresses = findalladdresses(dev.addresses)
// TODO: add more elements
ifs[j] = iface
j++
}
return
}
func findalladdresses(addresses *_Ctype_struct_pcap_addr) (retval []IFAddress) {
// TODO - make it support more than IPv4 and IPv6?
retval = make([]IFAddress, 0, 1)
for curaddr := addresses; curaddr != nil; curaddr = (*_Ctype_struct_pcap_addr)(curaddr.next) {
var a IFAddress
var err error
if a.IP, err = sockaddr_to_IP((*syscall.RawSockaddr)(unsafe.Pointer(curaddr.addr))); err != nil {
continue
}
if a.Netmask, err = sockaddr_to_IP((*syscall.RawSockaddr)(unsafe.Pointer(curaddr.addr))); err != nil {
continue
}
retval = append(retval, a)
}
return
}
func sockaddr_to_IP(rsa *syscall.RawSockaddr) (IP []byte, err error) {
switch rsa.Family {
case syscall.AF_INET:
pp := (*syscall.RawSockaddrInet4)(unsafe.Pointer(rsa))
IP = make([]byte, 4)
for i := 0; i < len(IP); i++ {
IP[i] = pp.Addr[i]
}
return
case syscall.AF_INET6:
pp := (*syscall.RawSockaddrInet6)(unsafe.Pointer(rsa))
IP = make([]byte, 16)
for i := 0; i < len(IP); i++ {
IP[i] = pp.Addr[i]
}
return
}
err = errors.New("Unsupported address type")
return
}
func (p *Pcap) Inject(data []byte) (err error) {
buf := (*C.char)(C.malloc((C.size_t)(len(data))))
for i := 0; i < len(data); i++ {
*(*byte)(unsafe.Pointer(uintptr(unsafe.Pointer(buf)) + uintptr(i))) = data[i]
}
if -1 == C.pcap_sendpacket(p.cptr, (*C.u_char)(unsafe.Pointer(buf)), (C.int)(len(data))) {
err = p.Geterror()
}
C.free(unsafe.Pointer(buf))
return
}

View File

@ -0,0 +1,49 @@
package main
import (
"flag"
"fmt"
"os"
"runtime/pprof"
"time"
"github.com/coreos/etcd/Godeps/_workspace/src/github.com/akrennmair/gopcap"
)
func main() {
var filename *string = flag.String("file", "", "filename")
var decode *bool = flag.Bool("d", false, "If true, decode each packet")
var cpuprofile *string = flag.String("cpuprofile", "", "filename")
flag.Parse()
h, err := pcap.Openoffline(*filename)
if err != nil {
fmt.Printf("Couldn't create pcap reader: %v", err)
}
if *cpuprofile != "" {
if out, err := os.Create(*cpuprofile); err == nil {
pprof.StartCPUProfile(out)
defer func() {
pprof.StopCPUProfile()
out.Close()
}()
} else {
panic(err)
}
}
i, nilPackets := 0, 0
start := time.Now()
for pkt, code := h.NextEx(); code != -2; pkt, code = h.NextEx() {
if pkt == nil {
nilPackets++
} else if *decode {
pkt.Decode()
}
i++
}
duration := time.Since(start)
fmt.Printf("Took %v to process %v packets, %v per packet, %d nil packets\n", duration, i, duration/time.Duration(i), nilPackets)
}

View File

@ -0,0 +1,96 @@
package main
// Parses a pcap file, writes it back to disk, then verifies the files
// are the same.
import (
"bufio"
"flag"
"fmt"
"io"
"os"
"github.com/coreos/etcd/Godeps/_workspace/src/github.com/akrennmair/gopcap"
)
var input *string = flag.String("input", "", "input file")
var output *string = flag.String("output", "", "output file")
var decode *bool = flag.Bool("decode", false, "print decoded packets")
func copyPcap(dest, src string) {
f, err := os.Open(src)
if err != nil {
fmt.Printf("couldn't open %q: %v\n", src, err)
return
}
defer f.Close()
reader, err := pcap.NewReader(bufio.NewReader(f))
if err != nil {
fmt.Printf("couldn't create reader: %v\n", err)
return
}
w, err := os.Create(dest)
if err != nil {
fmt.Printf("couldn't open %q: %v\n", dest, err)
return
}
defer w.Close()
buf := bufio.NewWriter(w)
writer, err := pcap.NewWriter(buf, &reader.Header)
if err != nil {
fmt.Printf("couldn't create writer: %v\n", err)
return
}
for {
pkt := reader.Next()
if pkt == nil {
break
}
if *decode {
pkt.Decode()
fmt.Println(pkt.String())
}
writer.Write(pkt)
}
buf.Flush()
}
func check(dest, src string) {
f, err := os.Open(src)
if err != nil {
fmt.Printf("couldn't open %q: %v\n", src, err)
return
}
defer f.Close()
freader := bufio.NewReader(f)
g, err := os.Open(dest)
if err != nil {
fmt.Printf("couldn't open %q: %v\n", src, err)
return
}
defer g.Close()
greader := bufio.NewReader(g)
for {
fb, ferr := freader.ReadByte()
gb, gerr := greader.ReadByte()
if ferr == io.EOF && gerr == io.EOF {
break
}
if fb == gb {
continue
}
fmt.Println("FAIL")
return
}
fmt.Println("PASS")
}
func main() {
flag.Parse()
copyPcap(*output, *input)
check(*output, *input)
}

View File

@ -0,0 +1,82 @@
package main
import (
"flag"
"fmt"
"time"
"github.com/coreos/etcd/Godeps/_workspace/src/github.com/akrennmair/gopcap"
)
func min(x uint32, y uint32) uint32 {
if x < y {
return x
}
return y
}
func main() {
var device *string = flag.String("d", "", "device")
var file *string = flag.String("r", "", "file")
var expr *string = flag.String("e", "", "filter expression")
flag.Parse()
var h *pcap.Pcap
var err error
ifs, err := pcap.Findalldevs()
if len(ifs) == 0 {
fmt.Printf("Warning: no devices found : %s\n", err)
} else {
for i := 0; i < len(ifs); i++ {
fmt.Printf("dev %d: %s (%s)\n", i+1, ifs[i].Name, ifs[i].Description)
}
}
if *device != "" {
h, err = pcap.Openlive(*device, 65535, true, 0)
if h == nil {
fmt.Printf("Openlive(%s) failed: %s\n", *device, err)
return
}
} else if *file != "" {
h, err = pcap.Openoffline(*file)
if h == nil {
fmt.Printf("Openoffline(%s) failed: %s\n", *file, err)
return
}
} else {
fmt.Printf("usage: pcaptest [-d <device> | -r <file>]\n")
return
}
defer h.Close()
fmt.Printf("pcap version: %s\n", pcap.Version())
if *expr != "" {
fmt.Printf("Setting filter: %s\n", *expr)
err := h.Setfilter(*expr)
if err != nil {
fmt.Printf("Warning: setting filter failed: %s\n", err)
}
}
for pkt := h.Next(); pkt != nil; pkt = h.Next() {
fmt.Printf("time: %d.%06d (%s) caplen: %d len: %d\nData:",
int64(pkt.Time.Second()), int64(pkt.Time.Nanosecond()),
time.Unix(int64(pkt.Time.Second()), 0).String(), int64(pkt.Caplen), int64(pkt.Len))
for i := uint32(0); i < pkt.Caplen; i++ {
if i%32 == 0 {
fmt.Printf("\n")
}
if 32 <= pkt.Data[i] && pkt.Data[i] <= 126 {
fmt.Printf("%c", pkt.Data[i])
} else {
fmt.Printf(".")
}
}
fmt.Printf("\n\n")
}
}

View File

@ -0,0 +1,121 @@
package main
import (
"bufio"
"flag"
"fmt"
"os"
"github.com/coreos/etcd/Godeps/_workspace/src/github.com/akrennmair/gopcap"
)
const (
TYPE_IP = 0x0800
TYPE_ARP = 0x0806
TYPE_IP6 = 0x86DD
IP_ICMP = 1
IP_INIP = 4
IP_TCP = 6
IP_UDP = 17
)
var out *bufio.Writer
var errout *bufio.Writer
func main() {
var device *string = flag.String("i", "", "interface")
var snaplen *int = flag.Int("s", 65535, "snaplen")
var hexdump *bool = flag.Bool("X", false, "hexdump")
expr := ""
out = bufio.NewWriter(os.Stdout)
errout = bufio.NewWriter(os.Stderr)
flag.Usage = func() {
fmt.Fprintf(errout, "usage: %s [ -i interface ] [ -s snaplen ] [ -X ] [ expression ]\n", os.Args[0])
errout.Flush()
os.Exit(1)
}
flag.Parse()
if len(flag.Args()) > 0 {
expr = flag.Arg(0)
}
if *device == "" {
devs, err := pcap.Findalldevs()
if err != nil {
fmt.Fprintf(errout, "tcpdump: couldn't find any devices: %s\n", err)
}
if 0 == len(devs) {
flag.Usage()
}
*device = devs[0].Name
}
h, err := pcap.Openlive(*device, int32(*snaplen), true, 0)
if h == nil {
fmt.Fprintf(errout, "tcpdump: %s\n", err)
errout.Flush()
return
}
defer h.Close()
if expr != "" {
ferr := h.Setfilter(expr)
if ferr != nil {
fmt.Fprintf(out, "tcpdump: %s\n", ferr)
out.Flush()
}
}
for pkt := h.Next(); pkt != nil; pkt = h.Next() {
pkt.Decode()
fmt.Fprintf(out, "%s\n", pkt.String())
if *hexdump {
Hexdump(pkt)
}
out.Flush()
}
}
func min(a, b int) int {
if a < b {
return a
}
return b
}
func Hexdump(pkt *pcap.Packet) {
for i := 0; i < len(pkt.Data); i += 16 {
Dumpline(uint32(i), pkt.Data[i:min(i+16, len(pkt.Data))])
}
}
func Dumpline(addr uint32, line []byte) {
fmt.Fprintf(out, "\t0x%04x: ", int32(addr))
var i uint16
for i = 0; i < 16 && i < uint16(len(line)); i++ {
if i%2 == 0 {
out.WriteString(" ")
}
fmt.Fprintf(out, "%02x", line[i])
}
for j := i; j <= 16; j++ {
if j%2 == 0 {
out.WriteString(" ")
}
out.WriteString(" ")
}
out.WriteString(" ")
for i = 0; i < 16 && i < uint16(len(line)); i++ {
if line[i] >= 32 && line[i] <= 126 {
fmt.Fprintf(out, "%c", line[i])
} else {
out.WriteString(".")
}
}
out.WriteString("\n")
}

View File

@ -4,7 +4,7 @@
// Original code is based on code by RogerV in the golang-nuts thread:
// https://groups.google.com/group/golang-nuts/browse_thread/thread/40cc41e9d9fc9247
// +build darwin freebsd linux netbsd openbsd
// +build darwin freebsd linux netbsd openbsd solaris
package speakeasy
@ -19,9 +19,8 @@ import (
const sttyArg0 = "/bin/stty"
var (
sttyArgvEOff []string = []string{"stty", "-echo"}
sttyArgvEOn []string = []string{"stty", "echo"}
ws syscall.WaitStatus = 0
sttyArgvEOff = []string{"stty", "-echo"}
sttyArgvEOn = []string{"stty", "echo"}
)
// getPassword gets input hidden from the terminal from a user. This is
@ -47,10 +46,11 @@ func getPassword() (password string, err error) {
}
// Turn on the terminal echo and stop listening for signals.
defer signal.Stop(sig)
defer close(brk)
defer echoOn(fd)
syscall.Wait4(pid, &ws, 0, nil)
syscall.Wait4(pid, nil, 0, nil)
line, err := readline()
if err == nil {
@ -76,7 +76,7 @@ func echoOn(fd []uintptr) {
// Turn on the terminal echo.
pid, e := syscall.ForkExec(sttyArg0, sttyArgvEOn, &syscall.ProcAttr{Dir: "", Files: fd})
if e == nil {
syscall.Wait4(pid, &ws, 0, nil)
syscall.Wait4(pid, nil, 0, nil)
}
}

View File

@ -1,3 +1,4 @@
*.prof
*.test
*.swp
/bin/

View File

@ -1,54 +1,18 @@
TEST=.
BENCH=.
COVERPROFILE=/tmp/c.out
BRANCH=`git rev-parse --abbrev-ref HEAD`
COMMIT=`git rev-parse --short HEAD`
GOLDFLAGS="-X main.branch $(BRANCH) -X main.commit $(COMMIT)"
default: build
bench:
go test -v -test.run=NOTHINCONTAINSTHIS -test.bench=$(BENCH)
# http://cloc.sourceforge.net/
cloc:
@cloc --not-match-f='Makefile|_test.go' .
cover: fmt
go test -coverprofile=$(COVERPROFILE) -test.run=$(TEST) $(COVERFLAG) .
go tool cover -html=$(COVERPROFILE)
rm $(COVERPROFILE)
cpuprofile: fmt
@go test -c
@./bolt.test -test.v -test.run=$(TEST) -test.cpuprofile cpu.prof
race:
@go test -v -race -test.run="TestSimulate_(100op|1000op)"
# go get github.com/kisielk/errcheck
errcheck:
@echo "=== errcheck ==="
@errcheck github.com/boltdb/bolt
@errcheck -ignorepkg=bytes -ignore=os:Remove github.com/boltdb/bolt
fmt:
@go fmt ./...
test:
@go test -v -cover .
@go test -v ./cmd/bolt
get:
@go get -d ./...
build: get
@mkdir -p bin
@go build -ldflags=$(GOLDFLAGS) -a -o bin/bolt ./cmd/bolt
test: fmt
@go get github.com/stretchr/testify/assert
@echo "=== TESTS ==="
@go test -v -cover -test.run=$(TEST)
@echo ""
@echo ""
@echo "=== CLI ==="
@go test -v -test.run=$(TEST) ./cmd/bolt
@echo ""
@echo ""
@echo "=== RACE DETECTOR ==="
@go test -v -race -test.run="TestSimulate_(100op|1000op)"
.PHONY: bench cloc cover cpuprofile fmt memprofile test
.PHONY: fmt test

View File

@ -1,8 +1,8 @@
Bolt [![Build Status](https://drone.io/github.com/boltdb/bolt/status.png)](https://drone.io/github.com/boltdb/bolt/latest) [![Coverage Status](https://coveralls.io/repos/boltdb/bolt/badge.png?branch=master)](https://coveralls.io/r/boltdb/bolt?branch=master) [![GoDoc](https://godoc.org/github.com/boltdb/bolt?status.png)](https://godoc.org/github.com/boltdb/bolt) ![Version](http://img.shields.io/badge/version-1.0-green.png)
Bolt [![Build Status](https://drone.io/github.com/boltdb/bolt/status.png)](https://drone.io/github.com/boltdb/bolt/latest) [![Coverage Status](https://coveralls.io/repos/boltdb/bolt/badge.svg?branch=master)](https://coveralls.io/r/boltdb/bolt?branch=master) [![GoDoc](https://godoc.org/github.com/boltdb/bolt?status.svg)](https://godoc.org/github.com/boltdb/bolt) ![Version](https://img.shields.io/badge/version-1.0-green.svg)
====
Bolt is a pure Go key/value store inspired by [Howard Chu's][hyc_symas] and
the [LMDB project][lmdb]. The goal of the project is to provide a simple,
Bolt is a pure Go key/value store inspired by [Howard Chu's][hyc_symas]
[LMDB project][lmdb]. The goal of the project is to provide a simple,
fast, and reliable database for projects that don't require a full database
server such as Postgres or MySQL.
@ -13,7 +13,6 @@ and setting values. That's it.
[hyc_symas]: https://twitter.com/hyc_symas
[lmdb]: http://symas.com/mdb/
## Project Status
Bolt is stable and the API is fixed. Full unit test coverage and randomized
@ -22,6 +21,36 @@ Bolt is currently in high-load production environments serving databases as
large as 1TB. Many companies such as Shopify and Heroku use Bolt-backed
services every day.
## Table of Contents
- [Getting Started](#getting-started)
- [Installing](#installing)
- [Opening a database](#opening-a-database)
- [Transactions](#transactions)
- [Read-write transactions](#read-write-transactions)
- [Read-only transactions](#read-only-transactions)
- [Batch read-write transactions](#batch-read-write-transactions)
- [Managing transactions manually](#managing-transactions-manually)
- [Using buckets](#using-buckets)
- [Using key/value pairs](#using-keyvalue-pairs)
- [Autoincrementing integer for the bucket](#autoincrementing-integer-for-the-bucket)
- [Iterating over keys](#iterating-over-keys)
- [Prefix scans](#prefix-scans)
- [Range scans](#range-scans)
- [ForEach()](#foreach)
- [Nested buckets](#nested-buckets)
- [Database backups](#database-backups)
- [Statistics](#statistics)
- [Read-Only Mode](#read-only-mode)
- [Mobile Use (iOS/Android)](#mobile-use-iosandroid)
- [Resources](#resources)
- [Comparison with other databases](#comparison-with-other-databases)
- [Postgres, MySQL, & other relational databases](#postgres-mysql--other-relational-databases)
- [LevelDB, RocksDB](#leveldb-rocksdb)
- [LMDB](#lmdb)
- [Caveats & Limitations](#caveats--limitations)
- [Reading the Source](#reading-the-source)
- [Other Projects Using Bolt](#other-projects-using-bolt)
## Getting Started
@ -87,6 +116,11 @@ are not thread safe. To work with data in multiple goroutines you must start
a transaction for each one or use locking to ensure only one goroutine accesses
a transaction at a time. Creating transaction from the `DB` is thread safe.
Read-only transactions and read-write transactions should not depend on one
another and generally shouldn't be opened simultaneously in the same goroutine.
This can cause a deadlock as the read-write transaction needs to periodically
re-map the data file but it cannot do so while a read-only transaction is open.
#### Read-write transactions
@ -175,8 +209,8 @@ and then safely close your transaction if an error is returned. This is the
recommended way to use Bolt transactions.
However, sometimes you may want to manually start and end your transactions.
You can use the `Tx.Begin()` function directly but _please_ be sure to close the
transaction.
You can use the `Tx.Begin()` function directly but **please** be sure to close
the transaction.
```go
// Start a writable transaction.
@ -251,7 +285,7 @@ db.View(func(tx *bolt.Tx) error {
```
The `Get()` function does not return an error because its operation is
guarenteed to work (unless there is some kind of system failure). If the key
guaranteed to work (unless there is some kind of system failure). If the key
exists then it will return its byte slice value. If it doesn't exist then it
will return `nil`. It's important to note that you can have a zero-length value
set to a key which is different than the key not existing.
@ -263,6 +297,49 @@ transaction is open. If you need to use a value outside of the transaction
then you must use `copy()` to copy it to another byte slice.
### Autoincrementing integer for the bucket
By using the `NextSequence()` function, you can let Bolt determine a sequence
which can be used as the unique identifier for your key/value pairs. See the
example below.
```go
// CreateUser saves u to the store. The new user ID is set on u once the data is persisted.
func (s *Store) CreateUser(u *User) error {
return s.db.Update(func(tx *bolt.Tx) error {
// Retrieve the users bucket.
// This should be created when the DB is first opened.
b := tx.Bucket([]byte("users"))
// Generate ID for the user.
// This returns an error only if the Tx is closed or not writeable.
// That can't happen in an Update() call so I ignore the error check.
id, _ = b.NextSequence()
u.ID = int(id)
// Marshal user data into bytes.
buf, err := json.Marshal(u)
if err != nil {
return err
}
// Persist bytes to users bucket.
return b.Put(itob(u.ID), buf)
})
}
// itob returns an 8-byte big endian representation of v.
func itob(v int) []byte {
b := make([]byte, 8)
binary.BigEndian.PutUint64(b, uint64(v))
return b
}
type User struct {
ID int
...
}
```
### Iterating over keys
Bolt stores its keys in byte-sorted order within a bucket. This makes sequential
@ -271,7 +348,9 @@ iteration over these keys extremely fast. To iterate over keys we'll use a
```go
db.View(func(tx *bolt.Tx) error {
// Assume bucket exists and has keys
b := tx.Bucket([]byte("MyBucket"))
c := b.Cursor()
for k, v := c.First(); k != nil; k, v = c.Next() {
@ -295,10 +374,15 @@ Next() Move to the next key.
Prev() Move to the previous key.
```
When you have iterated to the end of the cursor then `Next()` will return `nil`.
You must seek to a position using `First()`, `Last()`, or `Seek()` before
calling `Next()` or `Prev()`. If you do not seek to a position then these
functions will return `nil`.
Each of those functions has a return signature of `(key []byte, value []byte)`.
When you have iterated to the end of the cursor then `Next()` will return a
`nil` key. You must seek to a position using `First()`, `Last()`, or `Seek()`
before calling `Next()` or `Prev()`. If you do not seek to a position then
these functions will return a `nil` key.
During iteration, if the key is non-`nil` but the value is `nil`, that means
the key refers to a bucket rather than a value. Use `Bucket.Bucket()` to
access the sub-bucket.
#### Prefix scans
@ -307,6 +391,7 @@ To iterate over a key prefix, you can combine `Seek()` and `bytes.HasPrefix()`:
```go
db.View(func(tx *bolt.Tx) error {
// Assume bucket exists and has keys
c := tx.Bucket([]byte("MyBucket")).Cursor()
prefix := []byte("1234")
@ -326,7 +411,7 @@ date range like this:
```go
db.View(func(tx *bolt.Tx) error {
// Assume our events bucket has RFC3339 encoded time keys.
// Assume our events bucket exists and has RFC3339 encoded time keys.
c := tx.Bucket([]byte("Events")).Cursor()
// Our time range spans the 90's decade.
@ -350,7 +435,9 @@ all the keys in a bucket:
```go
db.View(func(tx *bolt.Tx) error {
// Assume bucket exists and has keys
b := tx.Bucket([]byte("MyBucket"))
b.ForEach(func(k, v []byte) error {
fmt.Printf("key=%s, value=%s\n", k, v)
return nil
@ -377,8 +464,11 @@ func (*Bucket) DeleteBucket(key []byte) error
Bolt is a single file so it's easy to backup. You can use the `Tx.WriteTo()`
function to write a consistent view of the database to a writer. If you call
this from a read-only transaction, it will perform a hot backup and not block
your other database reads and writes. It will also use `O_DIRECT` when available
to prevent page cache trashing.
your other database reads and writes.
By default, it will use a regular file handle which will utilize the operating
system's page cache. See the [`Tx`](https://godoc.org/github.com/boltdb/bolt#Tx)
documentation for information about optimizing for larger-than-RAM datasets.
One common use case is to backup over HTTP so you can use tools like `cURL` to
do database backups:
@ -446,6 +536,99 @@ It's also useful to pipe these stats to a service such as statsd for monitoring
or to provide an HTTP endpoint that will perform a fixed-length sample.
### Read-Only Mode
Sometimes it is useful to create a shared, read-only Bolt database. To this,
set the `Options.ReadOnly` flag when opening your database. Read-only mode
uses a shared lock to allow multiple processes to read from the database but
it will block any processes from opening the database in read-write mode.
```go
db, err := bolt.Open("my.db", 0666, &bolt.Options{ReadOnly: true})
if err != nil {
log.Fatal(err)
}
```
### Mobile Use (iOS/Android)
Bolt is able to run on mobile devices by leveraging the binding feature of the
[gomobile](https://github.com/golang/mobile) tool. Create a struct that will
contain your database logic and a reference to a `*bolt.DB` with a initializing
contstructor that takes in a filepath where the database file will be stored.
Neither Android nor iOS require extra permissions or cleanup from using this method.
```go
func NewBoltDB(filepath string) *BoltDB {
db, err := bolt.Open(filepath+"/demo.db", 0600, nil)
if err != nil {
log.Fatal(err)
}
return &BoltDB{db}
}
type BoltDB struct {
db *bolt.DB
...
}
func (b *BoltDB) Path() string {
return b.db.Path()
}
func (b *BoltDB) Close() {
b.db.Close()
}
```
Database logic should be defined as methods on this wrapper struct.
To initialize this struct from the native language (both platforms now sync
their local storage to the cloud. These snippets disable that functionality for the
database file):
#### Android
```java
String path;
if (android.os.Build.VERSION.SDK_INT >=android.os.Build.VERSION_CODES.LOLLIPOP){
path = getNoBackupFilesDir().getAbsolutePath();
} else{
path = getFilesDir().getAbsolutePath();
}
Boltmobiledemo.BoltDB boltDB = Boltmobiledemo.NewBoltDB(path)
```
#### iOS
```objc
- (void)demo {
NSString* path = [NSSearchPathForDirectoriesInDomains(NSLibraryDirectory,
NSUserDomainMask,
YES) objectAtIndex:0];
GoBoltmobiledemoBoltDB * demo = GoBoltmobiledemoNewBoltDB(path);
[self addSkipBackupAttributeToItemAtPath:demo.path];
//Some DB Logic would go here
[demo close];
}
- (BOOL)addSkipBackupAttributeToItemAtPath:(NSString *) filePathString
{
NSURL* URL= [NSURL fileURLWithPath: filePathString];
assert([[NSFileManager defaultManager] fileExistsAtPath: [URL path]]);
NSError *error = nil;
BOOL success = [URL setResourceValue: [NSNumber numberWithBool: YES]
forKey: NSURLIsExcludedFromBackupKey error: &error];
if(!success){
NSLog(@"Error excluding %@ from backup %@", [URL lastPathComponent], error);
}
return success;
}
```
## Resources
For more information on getting started with Bolt, check out the following articles:
@ -480,7 +663,7 @@ they are libraries bundled into the application, however, their underlying
structure is a log-structured merge-tree (LSM tree). An LSM tree optimizes
random writes by using a write ahead log and multi-tiered, sorted files called
SSTables. Bolt uses a B+tree internally and only a single file. Both approaches
have trade offs.
have trade-offs.
If you require a high random write throughput (>10,000 w/sec) or you need to use
spinning disks then LevelDB could be a good choice. If your application is
@ -516,9 +699,8 @@ It's important to pick the right tool for the job and Bolt is no exception.
Here are a few things to note when evaluating and using Bolt:
* Bolt is good for read intensive workloads. Sequential write performance is
also fast but random writes can be slow. You can add a write-ahead log or
[transaction coalescer](https://github.com/boltdb/coalescer) in front of Bolt
to mitigate this issue.
also fast but random writes can be slow. You can use `DB.Batch()` or add a
write-ahead log to help mitigate this issue.
* Bolt uses a B+tree internally so there can be a lot of random page access.
SSDs provide a significant performance boost over spinning disks.
@ -548,7 +730,14 @@ Here are a few things to note when evaluating and using Bolt:
can in memory and will release memory as needed to other processes. This means
that Bolt can show very high memory usage when working with large databases.
However, this is expected and the OS will release memory as needed. Bolt can
handle databases much larger than the available physical RAM.
handle databases much larger than the available physical RAM, provided its
memory-map fits in the process virtual address space. It may be problematic
on 32-bits systems.
* The data structures in the Bolt database are memory mapped so the data file
will be endian specific. This means that you cannot copy a Bolt file from a
little endian machine to a big endian machine and have it work. For most
users this is not a concern since most modern CPUs are little endian.
* Because of the way pages are laid out on disk, Bolt cannot truncate data files
and return free pages back to the disk. Instead, Bolt maintains a free list
@ -562,30 +751,93 @@ Here are a few things to note when evaluating and using Bolt:
[page-allocation]: https://github.com/boltdb/bolt/issues/308#issuecomment-74811638
## Reading the Source
Bolt is a relatively small code base (<3KLOC) for an embedded, serializable,
transactional key/value database so it can be a good starting point for people
interested in how databases work.
The best places to start are the main entry points into Bolt:
- `Open()` - Initializes the reference to the database. It's responsible for
creating the database if it doesn't exist, obtaining an exclusive lock on the
file, reading the meta pages, & memory-mapping the file.
- `DB.Begin()` - Starts a read-only or read-write transaction depending on the
value of the `writable` argument. This requires briefly obtaining the "meta"
lock to keep track of open transactions. Only one read-write transaction can
exist at a time so the "rwlock" is acquired during the life of a read-write
transaction.
- `Bucket.Put()` - Writes a key/value pair into a bucket. After validating the
arguments, a cursor is used to traverse the B+tree to the page and position
where they key & value will be written. Once the position is found, the bucket
materializes the underlying page and the page's parent pages into memory as
"nodes". These nodes are where mutations occur during read-write transactions.
These changes get flushed to disk during commit.
- `Bucket.Get()` - Retrieves a key/value pair from a bucket. This uses a cursor
to move to the page & position of a key/value pair. During a read-only
transaction, the key and value data is returned as a direct reference to the
underlying mmap file so there's no allocation overhead. For read-write
transactions, this data may reference the mmap file or one of the in-memory
node values.
- `Cursor` - This object is simply for traversing the B+tree of on-disk pages
or in-memory nodes. It can seek to a specific key, move to the first or last
value, or it can move forward or backward. The cursor handles the movement up
and down the B+tree transparently to the end user.
- `Tx.Commit()` - Converts the in-memory dirty nodes and the list of free pages
into pages to be written to disk. Writing to disk then occurs in two phases.
First, the dirty pages are written to disk and an `fsync()` occurs. Second, a
new meta page with an incremented transaction ID is written and another
`fsync()` occurs. This two phase write ensures that partially written data
pages are ignored in the event of a crash since the meta page pointing to them
is never written. Partially written meta pages are invalidated because they
are written with a checksum.
If you have additional notes that could be helpful for others, please submit
them via pull request.
## Other Projects Using Bolt
Below is a list of public, open source projects that use Bolt:
* [Operation Go: A Routine Mission](http://gocode.io) - An online programming game for Golang using Bolt for user accounts and a leaderboard.
* [Bazil](https://github.com/bazillion/bazil) - A file system that lets your data reside where it is most convenient for it to reside.
* [Bazil](https://bazil.org/) - A file system that lets your data reside where it is most convenient for it to reside.
* [DVID](https://github.com/janelia-flyem/dvid) - Added Bolt as optional storage engine and testing it against Basho-tuned leveldb.
* [Skybox Analytics](https://github.com/skybox/skybox) - A standalone funnel analysis tool for web analytics.
* [Scuttlebutt](https://github.com/benbjohnson/scuttlebutt) - Uses Bolt to store and process all Twitter mentions of GitHub projects.
* [Wiki](https://github.com/peterhellberg/wiki) - A tiny wiki using Goji, BoltDB and Blackfriday.
* [ChainStore](https://github.com/nulayer/chainstore) - Simple key-value interface to a variety of storage engines organized as a chain of operations.
* [ChainStore](https://github.com/pressly/chainstore) - Simple key-value interface to a variety of storage engines organized as a chain of operations.
* [MetricBase](https://github.com/msiebuhr/MetricBase) - Single-binary version of Graphite.
* [Gitchain](https://github.com/gitchain/gitchain) - Decentralized, peer-to-peer Git repositories aka "Git meets Bitcoin".
* [event-shuttle](https://github.com/sclasen/event-shuttle) - A Unix system service to collect and reliably deliver messages to Kafka.
* [ipxed](https://github.com/kelseyhightower/ipxed) - Web interface and api for ipxed.
* [BoltStore](https://github.com/yosssi/boltstore) - Session store using Bolt.
* [photosite/session](http://godoc.org/bitbucket.org/kardianos/photosite/session) - Sessions for a photo viewing site.
* [photosite/session](https://godoc.org/bitbucket.org/kardianos/photosite/session) - Sessions for a photo viewing site.
* [LedisDB](https://github.com/siddontang/ledisdb) - A high performance NoSQL, using Bolt as optional storage.
* [ipLocator](https://github.com/AndreasBriese/ipLocator) - A fast ip-geo-location-server using bolt with bloom filters.
* [cayley](https://github.com/google/cayley) - Cayley is an open-source graph database using Bolt as optional backend.
* [bleve](http://www.blevesearch.com/) - A pure Go search engine similar to ElasticSearch that uses Bolt as the default storage backend.
* [tentacool](https://github.com/optiflows/tentacool) - REST api server to manage system stuff (IP, DNS, Gateway...) on a linux server.
* [SkyDB](https://github.com/skydb/sky) - Behavioral analytics database.
* [Seaweed File System](https://github.com/chrislusf/weed-fs) - Highly scalable distributed key~file system with O(1) disk read.
* [InfluxDB](http://influxdb.com) - Scalable datastore for metrics, events, and real-time analytics.
* [Seaweed File System](https://github.com/chrislusf/seaweedfs) - Highly scalable distributed key~file system with O(1) disk read.
* [InfluxDB](https://influxdata.com) - Scalable datastore for metrics, events, and real-time analytics.
* [Freehold](http://tshannon.bitbucket.org/freehold/) - An open, secure, and lightweight platform for your files and data.
* [Prometheus Annotation Server](https://github.com/oliver006/prom_annotation_server) - Annotation server for PromDash & Prometheus service monitoring system.
* [Consul](https://github.com/hashicorp/consul) - Consul is service discovery and configuration made easy. Distributed, highly available, and datacenter-aware.
* [Kala](https://github.com/ajvb/kala) - Kala is a modern job scheduler optimized to run on a single node. It is persistent, JSON over HTTP API, ISO 8601 duration notation, and dependent jobs.
* [drive](https://github.com/odeke-em/drive) - drive is an unofficial Google Drive command line client for \*NIX operating systems.
* [stow](https://github.com/djherbis/stow) - a persistence manager for objects
backed by boltdb.
* [buckets](https://github.com/joyrexus/buckets) - a bolt wrapper streamlining
simple tx and key scans.
* [mbuckets](https://github.com/abhigupta912/mbuckets) - A Bolt wrapper that allows easy operations on multi level (nested) buckets.
* [Request Baskets](https://github.com/darklynx/request-baskets) - A web service to collect arbitrary HTTP requests and inspect them via REST API or simple web UI, similar to [RequestBin](http://requestb.in/) service
* [Go Report Card](https://goreportcard.com/) - Go code quality report cards as a (free and open source) service.
* [Boltdb Boilerplate](https://github.com/bobintornado/boltdb-boilerplate) - Boilerplate wrapper around bolt aiming to make simple calls one-liners.
If you are using Bolt in a project please send a pull request to add it to the list.

View File

@ -1,135 +0,0 @@
package bolt
import (
"errors"
"fmt"
"sync"
"time"
)
// Batch calls fn as part of a batch. It behaves similar to Update,
// except:
//
// 1. concurrent Batch calls can be combined into a single Bolt
// transaction.
//
// 2. the function passed to Batch may be called multiple times,
// regardless of whether it returns error or not.
//
// This means that Batch function side effects must be idempotent and
// take permanent effect only after a successful return is seen in
// caller.
//
// Batch is only useful when there are multiple goroutines calling it.
func (db *DB) Batch(fn func(*Tx) error) error {
errCh := make(chan error, 1)
db.batchMu.Lock()
if (db.batch == nil) || (db.batch != nil && len(db.batch.calls) >= db.MaxBatchSize) {
// There is no existing batch, or the existing batch is full; start a new one.
db.batch = &batch{
db: db,
}
db.batch.timer = time.AfterFunc(db.MaxBatchDelay, db.batch.trigger)
}
db.batch.calls = append(db.batch.calls, call{fn: fn, err: errCh})
if len(db.batch.calls) >= db.MaxBatchSize {
// wake up batch, it's ready to run
go db.batch.trigger()
}
db.batchMu.Unlock()
err := <-errCh
if err == trySolo {
err = db.Update(fn)
}
return err
}
type call struct {
fn func(*Tx) error
err chan<- error
}
type batch struct {
db *DB
timer *time.Timer
start sync.Once
calls []call
}
// trigger runs the batch if it hasn't already been run.
func (b *batch) trigger() {
b.start.Do(b.run)
}
// run performs the transactions in the batch and communicates results
// back to DB.Batch.
func (b *batch) run() {
b.db.batchMu.Lock()
b.timer.Stop()
// Make sure no new work is added to this batch, but don't break
// other batches.
if b.db.batch == b {
b.db.batch = nil
}
b.db.batchMu.Unlock()
retry:
for len(b.calls) > 0 {
var failIdx = -1
err := b.db.Update(func(tx *Tx) error {
for i, c := range b.calls {
if err := safelyCall(c.fn, tx); err != nil {
failIdx = i
return err
}
}
return nil
})
if failIdx >= 0 {
// take the failing transaction out of the batch. it's
// safe to shorten b.calls here because db.batch no longer
// points to us, and we hold the mutex anyway.
c := b.calls[failIdx]
b.calls[failIdx], b.calls = b.calls[len(b.calls)-1], b.calls[:len(b.calls)-1]
// tell the submitter re-run it solo, continue with the rest of the batch
c.err <- trySolo
continue retry
}
// pass success, or bolt internal errors, to all callers
for _, c := range b.calls {
if c.err != nil {
c.err <- err
}
}
break retry
}
}
// trySolo is a special sentinel error value used for signaling that a
// transaction function should be re-run. It should never be seen by
// callers.
var trySolo = errors.New("batch function returned an error and should be re-run solo")
type panicked struct {
reason interface{}
}
func (p panicked) Error() string {
if err, ok := p.reason.(error); ok {
return err.Error()
}
return fmt.Sprintf("panic: %v", p.reason)
}
func safelyCall(fn func(*Tx) error, tx *Tx) (err error) {
defer func() {
if p := recover(); p != nil {
err = panicked{p}
}
}()
return fn(tx)
}

View File

@ -1,170 +0,0 @@
package bolt_test
import (
"bytes"
"encoding/binary"
"errors"
"hash/fnv"
"sync"
"testing"
"github.com/coreos/etcd/Godeps/_workspace/src/github.com/boltdb/bolt"
)
func validateBatchBench(b *testing.B, db *TestDB) {
var rollback = errors.New("sentinel error to cause rollback")
validate := func(tx *bolt.Tx) error {
bucket := tx.Bucket([]byte("bench"))
h := fnv.New32a()
buf := make([]byte, 4)
for id := uint32(0); id < 1000; id++ {
binary.LittleEndian.PutUint32(buf, id)
h.Reset()
h.Write(buf[:])
k := h.Sum(nil)
v := bucket.Get(k)
if v == nil {
b.Errorf("not found id=%d key=%x", id, k)
continue
}
if g, e := v, []byte("filler"); !bytes.Equal(g, e) {
b.Errorf("bad value for id=%d key=%x: %s != %q", id, k, g, e)
}
if err := bucket.Delete(k); err != nil {
return err
}
}
// should be empty now
c := bucket.Cursor()
for k, v := c.First(); k != nil; k, v = c.Next() {
b.Errorf("unexpected key: %x = %q", k, v)
}
return rollback
}
if err := db.Update(validate); err != nil && err != rollback {
b.Error(err)
}
}
func BenchmarkDBBatchAutomatic(b *testing.B) {
db := NewTestDB()
defer db.Close()
db.MustCreateBucket([]byte("bench"))
b.ResetTimer()
for i := 0; i < b.N; i++ {
start := make(chan struct{})
var wg sync.WaitGroup
for round := 0; round < 1000; round++ {
wg.Add(1)
go func(id uint32) {
defer wg.Done()
<-start
h := fnv.New32a()
buf := make([]byte, 4)
binary.LittleEndian.PutUint32(buf, id)
h.Write(buf[:])
k := h.Sum(nil)
insert := func(tx *bolt.Tx) error {
b := tx.Bucket([]byte("bench"))
return b.Put(k, []byte("filler"))
}
if err := db.Batch(insert); err != nil {
b.Error(err)
return
}
}(uint32(round))
}
close(start)
wg.Wait()
}
b.StopTimer()
validateBatchBench(b, db)
}
func BenchmarkDBBatchSingle(b *testing.B) {
db := NewTestDB()
defer db.Close()
db.MustCreateBucket([]byte("bench"))
b.ResetTimer()
for i := 0; i < b.N; i++ {
start := make(chan struct{})
var wg sync.WaitGroup
for round := 0; round < 1000; round++ {
wg.Add(1)
go func(id uint32) {
defer wg.Done()
<-start
h := fnv.New32a()
buf := make([]byte, 4)
binary.LittleEndian.PutUint32(buf, id)
h.Write(buf[:])
k := h.Sum(nil)
insert := func(tx *bolt.Tx) error {
b := tx.Bucket([]byte("bench"))
return b.Put(k, []byte("filler"))
}
if err := db.Update(insert); err != nil {
b.Error(err)
return
}
}(uint32(round))
}
close(start)
wg.Wait()
}
b.StopTimer()
validateBatchBench(b, db)
}
func BenchmarkDBBatchManual10x100(b *testing.B) {
db := NewTestDB()
defer db.Close()
db.MustCreateBucket([]byte("bench"))
b.ResetTimer()
for i := 0; i < b.N; i++ {
start := make(chan struct{})
var wg sync.WaitGroup
for major := 0; major < 10; major++ {
wg.Add(1)
go func(id uint32) {
defer wg.Done()
<-start
insert100 := func(tx *bolt.Tx) error {
h := fnv.New32a()
buf := make([]byte, 4)
for minor := uint32(0); minor < 100; minor++ {
binary.LittleEndian.PutUint32(buf, uint32(id*100+minor))
h.Reset()
h.Write(buf[:])
k := h.Sum(nil)
b := tx.Bucket([]byte("bench"))
if err := b.Put(k, []byte("filler")); err != nil {
return err
}
}
return nil
}
if err := db.Update(insert100); err != nil {
b.Fatal(err)
}
}(uint32(major))
}
close(start)
wg.Wait()
}
b.StopTimer()
validateBatchBench(b, db)
}

View File

@ -1,148 +0,0 @@
package bolt_test
import (
"encoding/binary"
"fmt"
"io/ioutil"
"log"
"math/rand"
"net/http"
"net/http/httptest"
"os"
"github.com/coreos/etcd/Godeps/_workspace/src/github.com/boltdb/bolt"
)
// Set this to see how the counts are actually updated.
const verbose = false
// Counter updates a counter in Bolt for every URL path requested.
type counter struct {
db *bolt.DB
}
func (c counter) ServeHTTP(rw http.ResponseWriter, req *http.Request) {
// Communicates the new count from a successful database
// transaction.
var result uint64
increment := func(tx *bolt.Tx) error {
b, err := tx.CreateBucketIfNotExists([]byte("hits"))
if err != nil {
return err
}
key := []byte(req.URL.String())
// Decode handles key not found for us.
count := decode(b.Get(key)) + 1
b.Put(key, encode(count))
// All good, communicate new count.
result = count
return nil
}
if err := c.db.Batch(increment); err != nil {
http.Error(rw, err.Error(), 500)
return
}
if verbose {
log.Printf("server: %s: %d", req.URL.String(), result)
}
rw.Header().Set("Content-Type", "application/octet-stream")
fmt.Fprintf(rw, "%d\n", result)
}
func client(id int, base string, paths []string) error {
// Process paths in random order.
rng := rand.New(rand.NewSource(int64(id)))
permutation := rng.Perm(len(paths))
for i := range paths {
path := paths[permutation[i]]
resp, err := http.Get(base + path)
if err != nil {
return err
}
defer resp.Body.Close()
buf, err := ioutil.ReadAll(resp.Body)
if err != nil {
return err
}
if verbose {
log.Printf("client: %s: %s", path, buf)
}
}
return nil
}
func ExampleDB_Batch() {
// Open the database.
db, _ := bolt.Open(tempfile(), 0666, nil)
defer os.Remove(db.Path())
defer db.Close()
// Start our web server
count := counter{db}
srv := httptest.NewServer(count)
defer srv.Close()
// Decrease the batch size to make things more interesting.
db.MaxBatchSize = 3
// Get every path multiple times concurrently.
const clients = 10
paths := []string{
"/foo",
"/bar",
"/baz",
"/quux",
"/thud",
"/xyzzy",
}
errors := make(chan error, clients)
for i := 0; i < clients; i++ {
go func(id int) {
errors <- client(id, srv.URL, paths)
}(i)
}
// Check all responses to make sure there's no error.
for i := 0; i < clients; i++ {
if err := <-errors; err != nil {
fmt.Printf("client error: %v", err)
return
}
}
// Check the final result
db.View(func(tx *bolt.Tx) error {
b := tx.Bucket([]byte("hits"))
c := b.Cursor()
for k, v := c.First(); k != nil; k, v = c.Next() {
fmt.Printf("hits to %s: %d\n", k, decode(v))
}
return nil
})
// Output:
// hits to /bar: 10
// hits to /baz: 10
// hits to /foo: 10
// hits to /quux: 10
// hits to /thud: 10
// hits to /xyzzy: 10
}
// encode marshals a counter.
func encode(n uint64) []byte {
buf := make([]byte, 8)
binary.BigEndian.PutUint64(buf, n)
return buf
}
// decode unmarshals a counter. Nil buffers are decoded as 0.
func decode(buf []byte) uint64 {
if buf == nil {
return 0
}
return binary.BigEndian.Uint64(buf)
}

View File

@ -1,167 +0,0 @@
package bolt_test
import (
"testing"
"time"
"github.com/coreos/etcd/Godeps/_workspace/src/github.com/boltdb/bolt"
)
// Ensure two functions can perform updates in a single batch.
func TestDB_Batch(t *testing.T) {
db := NewTestDB()
defer db.Close()
db.MustCreateBucket([]byte("widgets"))
// Iterate over multiple updates in separate goroutines.
n := 2
ch := make(chan error)
for i := 0; i < n; i++ {
go func(i int) {
ch <- db.Batch(func(tx *bolt.Tx) error {
return tx.Bucket([]byte("widgets")).Put(u64tob(uint64(i)), []byte{})
})
}(i)
}
// Check all responses to make sure there's no error.
for i := 0; i < n; i++ {
if err := <-ch; err != nil {
t.Fatal(err)
}
}
// Ensure data is correct.
db.MustView(func(tx *bolt.Tx) error {
b := tx.Bucket([]byte("widgets"))
for i := 0; i < n; i++ {
if v := b.Get(u64tob(uint64(i))); v == nil {
t.Errorf("key not found: %d", i)
}
}
return nil
})
}
func TestDB_Batch_Panic(t *testing.T) {
db := NewTestDB()
defer db.Close()
var sentinel int
var bork = &sentinel
var problem interface{}
var err error
// Execute a function inside a batch that panics.
func() {
defer func() {
if p := recover(); p != nil {
problem = p
}
}()
err = db.Batch(func(tx *bolt.Tx) error {
panic(bork)
})
}()
// Verify there is no error.
if g, e := err, error(nil); g != e {
t.Fatalf("wrong error: %v != %v", g, e)
}
// Verify the panic was captured.
if g, e := problem, bork; g != e {
t.Fatalf("wrong error: %v != %v", g, e)
}
}
func TestDB_BatchFull(t *testing.T) {
db := NewTestDB()
defer db.Close()
db.MustCreateBucket([]byte("widgets"))
const size = 3
// buffered so we never leak goroutines
ch := make(chan error, size)
put := func(i int) {
ch <- db.Batch(func(tx *bolt.Tx) error {
return tx.Bucket([]byte("widgets")).Put(u64tob(uint64(i)), []byte{})
})
}
db.MaxBatchSize = size
// high enough to never trigger here
db.MaxBatchDelay = 1 * time.Hour
go put(1)
go put(2)
// Give the batch a chance to exhibit bugs.
time.Sleep(10 * time.Millisecond)
// not triggered yet
select {
case <-ch:
t.Fatalf("batch triggered too early")
default:
}
go put(3)
// Check all responses to make sure there's no error.
for i := 0; i < size; i++ {
if err := <-ch; err != nil {
t.Fatal(err)
}
}
// Ensure data is correct.
db.MustView(func(tx *bolt.Tx) error {
b := tx.Bucket([]byte("widgets"))
for i := 1; i <= size; i++ {
if v := b.Get(u64tob(uint64(i))); v == nil {
t.Errorf("key not found: %d", i)
}
}
return nil
})
}
func TestDB_BatchTime(t *testing.T) {
db := NewTestDB()
defer db.Close()
db.MustCreateBucket([]byte("widgets"))
const size = 1
// buffered so we never leak goroutines
ch := make(chan error, size)
put := func(i int) {
ch <- db.Batch(func(tx *bolt.Tx) error {
return tx.Bucket([]byte("widgets")).Put(u64tob(uint64(i)), []byte{})
})
}
db.MaxBatchSize = 1000
db.MaxBatchDelay = 0
go put(1)
// Batch must trigger by time alone.
// Check all responses to make sure there's no error.
for i := 0; i < size; i++ {
if err := <-ch; err != nil {
t.Fatal(err)
}
}
// Ensure data is correct.
db.MustView(func(tx *bolt.Tx) error {
b := tx.Bucket([]byte("widgets"))
for i := 1; i <= size; i++ {
if v := b.Get(u64tob(uint64(i))); v == nil {
t.Errorf("key not found: %d", i)
}
}
return nil
})
}

View File

@ -0,0 +1,9 @@
// +build arm64
package bolt
// maxMapSize represents the largest mmap size supported by Bolt.
const maxMapSize = 0xFFFFFFFFFFFF // 256TB
// maxAllocSize is the size used when creating array pointers.
const maxAllocSize = 0x7FFFFFFF

View File

@ -4,8 +4,6 @@ import (
"syscall"
)
var odirect = syscall.O_DIRECT
// fdatasync flushes written data to a file descriptor.
func fdatasync(db *DB) error {
return syscall.Fdatasync(int(db.file.Fd()))

View File

@ -11,8 +11,6 @@ const (
msInvalidate // invalidate cached data
)
var odirect int
func msync(db *DB) error {
_, _, errno := syscall.Syscall(syscall.SYS_MSYNC, uintptr(unsafe.Pointer(db.data)), uintptr(db.datasz), msInvalidate)
if errno != 0 {

View File

@ -0,0 +1,9 @@
// +build ppc64le
package bolt
// maxMapSize represents the largest mmap size supported by Bolt.
const maxMapSize = 0xFFFFFFFFFFFF // 256TB
// maxAllocSize is the size used when creating array pointers.
const maxAllocSize = 0x7FFFFFFF

View File

@ -0,0 +1,9 @@
// +build s390x
package bolt
// maxMapSize represents the largest mmap size supported by Bolt.
const maxMapSize = 0xFFFFFFFFFFFF // 256TB
// maxAllocSize is the size used when creating array pointers.
const maxAllocSize = 0x7FFFFFFF

View File

@ -1,36 +0,0 @@
package bolt_test
import (
"fmt"
"path/filepath"
"reflect"
"runtime"
"testing"
)
// assert fails the test if the condition is false.
func assert(tb testing.TB, condition bool, msg string, v ...interface{}) {
if !condition {
_, file, line, _ := runtime.Caller(1)
fmt.Printf("\033[31m%s:%d: "+msg+"\033[39m\n\n", append([]interface{}{filepath.Base(file), line}, v...)...)
tb.FailNow()
}
}
// ok fails the test if an err is not nil.
func ok(tb testing.TB, err error) {
if err != nil {
_, file, line, _ := runtime.Caller(1)
fmt.Printf("\033[31m%s:%d: unexpected error: %s\033[39m\n\n", filepath.Base(file), line, err.Error())
tb.FailNow()
}
}
// equals fails the test if exp is not equal to act.
func equals(tb testing.TB, exp, act interface{}) {
if !reflect.DeepEqual(exp, act) {
_, file, line, _ := runtime.Caller(1)
fmt.Printf("\033[31m%s:%d:\n\n\texp: %#v\n\n\tgot: %#v\033[39m\n\n", filepath.Base(file), line, exp, act)
tb.FailNow()
}
}

View File

@ -1,4 +1,4 @@
// +build !windows,!plan9
// +build !windows,!plan9,!solaris
package bolt
@ -11,7 +11,7 @@ import (
)
// flock acquires an advisory lock on a file descriptor.
func flock(f *os.File, timeout time.Duration) error {
func flock(db *DB, mode os.FileMode, exclusive bool, timeout time.Duration) error {
var t time.Time
for {
// If we're beyond our timeout then return an error.
@ -21,9 +21,13 @@ func flock(f *os.File, timeout time.Duration) error {
} else if timeout > 0 && time.Since(t) > timeout {
return ErrTimeout
}
flag := syscall.LOCK_SH
if exclusive {
flag = syscall.LOCK_EX
}
// Otherwise attempt to obtain an exclusive lock.
err := syscall.Flock(int(f.Fd()), syscall.LOCK_EX|syscall.LOCK_NB)
err := syscall.Flock(int(db.file.Fd()), flag|syscall.LOCK_NB)
if err == nil {
return nil
} else if err != syscall.EWOULDBLOCK {
@ -36,27 +40,23 @@ func flock(f *os.File, timeout time.Duration) error {
}
// funlock releases an advisory lock on a file descriptor.
func funlock(f *os.File) error {
return syscall.Flock(int(f.Fd()), syscall.LOCK_UN)
func funlock(db *DB) error {
return syscall.Flock(int(db.file.Fd()), syscall.LOCK_UN)
}
// mmap memory maps a DB's data file.
func mmap(db *DB, sz int) error {
// Truncate and fsync to ensure file size metadata is flushed.
// https://github.com/boltdb/bolt/issues/284
if err := db.file.Truncate(int64(sz)); err != nil {
return fmt.Errorf("file resize error: %s", err)
}
if err := db.file.Sync(); err != nil {
return fmt.Errorf("file sync error: %s", err)
}
// Map the data file to memory.
b, err := syscall.Mmap(int(db.file.Fd()), 0, sz, syscall.PROT_READ, syscall.MAP_SHARED)
b, err := syscall.Mmap(int(db.file.Fd()), 0, sz, syscall.PROT_READ, syscall.MAP_SHARED|db.MmapFlags)
if err != nil {
return err
}
// Advise the kernel that the mmap is accessed randomly.
if err := madvise(b, syscall.MADV_RANDOM); err != nil {
return fmt.Errorf("madvise: %s", err)
}
// Save the original byte slice and convert to a byte array pointer.
db.dataref = b
db.data = (*[maxMapSize]byte)(unsafe.Pointer(&b[0]))
@ -78,3 +78,12 @@ func munmap(db *DB) error {
db.datasz = 0
return err
}
// NOTE: This function is copied from stdlib because it is not available on darwin.
func madvise(b []byte, advice int) (err error) {
_, _, e1 := syscall.Syscall(syscall.SYS_MADVISE, uintptr(unsafe.Pointer(&b[0])), uintptr(len(b)), uintptr(advice))
if e1 != 0 {
err = e1
}
return
}

View File

@ -0,0 +1,90 @@
package bolt
import (
"fmt"
"os"
"syscall"
"time"
"unsafe"
"github.com/coreos/etcd/Godeps/_workspace/src/golang.org/x/sys/unix"
)
// flock acquires an advisory lock on a file descriptor.
func flock(db *DB, mode os.FileMode, exclusive bool, timeout time.Duration) error {
var t time.Time
for {
// If we're beyond our timeout then return an error.
// This can only occur after we've attempted a flock once.
if t.IsZero() {
t = time.Now()
} else if timeout > 0 && time.Since(t) > timeout {
return ErrTimeout
}
var lock syscall.Flock_t
lock.Start = 0
lock.Len = 0
lock.Pid = 0
lock.Whence = 0
lock.Pid = 0
if exclusive {
lock.Type = syscall.F_WRLCK
} else {
lock.Type = syscall.F_RDLCK
}
err := syscall.FcntlFlock(db.file.Fd(), syscall.F_SETLK, &lock)
if err == nil {
return nil
} else if err != syscall.EAGAIN {
return err
}
// Wait for a bit and try again.
time.Sleep(50 * time.Millisecond)
}
}
// funlock releases an advisory lock on a file descriptor.
func funlock(db *DB) error {
var lock syscall.Flock_t
lock.Start = 0
lock.Len = 0
lock.Type = syscall.F_UNLCK
lock.Whence = 0
return syscall.FcntlFlock(uintptr(db.file.Fd()), syscall.F_SETLK, &lock)
}
// mmap memory maps a DB's data file.
func mmap(db *DB, sz int) error {
// Map the data file to memory.
b, err := unix.Mmap(int(db.file.Fd()), 0, sz, syscall.PROT_READ, syscall.MAP_SHARED|db.MmapFlags)
if err != nil {
return err
}
// Advise the kernel that the mmap is accessed randomly.
if err := unix.Madvise(b, syscall.MADV_RANDOM); err != nil {
return fmt.Errorf("madvise: %s", err)
}
// Save the original byte slice and convert to a byte array pointer.
db.dataref = b
db.data = (*[maxMapSize]byte)(unsafe.Pointer(&b[0]))
db.datasz = sz
return nil
}
// munmap unmaps a DB's data file from memory.
func munmap(db *DB) error {
// Ignore the unmap if we have no mapped data.
if db.dataref == nil {
return nil
}
// Unmap using the original byte slice.
err := unix.Munmap(db.dataref)
db.dataref = nil
db.data = nil
db.datasz = 0
return err
}

View File

@ -8,7 +8,39 @@ import (
"unsafe"
)
var odirect int
// LockFileEx code derived from golang build filemutex_windows.go @ v1.5.1
var (
modkernel32 = syscall.NewLazyDLL("kernel32.dll")
procLockFileEx = modkernel32.NewProc("LockFileEx")
procUnlockFileEx = modkernel32.NewProc("UnlockFileEx")
)
const (
lockExt = ".lock"
// see https://msdn.microsoft.com/en-us/library/windows/desktop/aa365203(v=vs.85).aspx
flagLockExclusive = 2
flagLockFailImmediately = 1
// see https://msdn.microsoft.com/en-us/library/windows/desktop/ms681382(v=vs.85).aspx
errLockViolation syscall.Errno = 0x21
)
func lockFileEx(h syscall.Handle, flags, reserved, locklow, lockhigh uint32, ol *syscall.Overlapped) (err error) {
r, _, err := procLockFileEx.Call(uintptr(h), uintptr(flags), uintptr(reserved), uintptr(locklow), uintptr(lockhigh), uintptr(unsafe.Pointer(ol)))
if r == 0 {
return err
}
return nil
}
func unlockFileEx(h syscall.Handle, reserved, locklow, lockhigh uint32, ol *syscall.Overlapped) (err error) {
r, _, err := procUnlockFileEx.Call(uintptr(h), uintptr(reserved), uintptr(locklow), uintptr(lockhigh), uintptr(unsafe.Pointer(ol)), 0)
if r == 0 {
return err
}
return nil
}
// fdatasync flushes written data to a file descriptor.
func fdatasync(db *DB) error {
@ -16,21 +48,59 @@ func fdatasync(db *DB) error {
}
// flock acquires an advisory lock on a file descriptor.
func flock(f *os.File, _ time.Duration) error {
return nil
func flock(db *DB, mode os.FileMode, exclusive bool, timeout time.Duration) error {
// Create a separate lock file on windows because a process
// cannot share an exclusive lock on the same file. This is
// needed during Tx.WriteTo().
f, err := os.OpenFile(db.path+lockExt, os.O_CREATE, mode)
if err != nil {
return err
}
db.lockfile = f
var t time.Time
for {
// If we're beyond our timeout then return an error.
// This can only occur after we've attempted a flock once.
if t.IsZero() {
t = time.Now()
} else if timeout > 0 && time.Since(t) > timeout {
return ErrTimeout
}
var flag uint32 = flagLockFailImmediately
if exclusive {
flag |= flagLockExclusive
}
err := lockFileEx(syscall.Handle(db.lockfile.Fd()), flag, 0, 1, 0, &syscall.Overlapped{})
if err == nil {
return nil
} else if err != errLockViolation {
return err
}
// Wait for a bit and try again.
time.Sleep(50 * time.Millisecond)
}
}
// funlock releases an advisory lock on a file descriptor.
func funlock(f *os.File) error {
return nil
func funlock(db *DB) error {
err := unlockFileEx(syscall.Handle(db.lockfile.Fd()), 0, 1, 0, &syscall.Overlapped{})
db.lockfile.Close()
os.Remove(db.path+lockExt)
return err
}
// mmap memory maps a DB's data file.
// Based on: https://github.com/edsrzf/mmap-go
func mmap(db *DB, sz int) error {
// Truncate the database to the size of the mmap.
if err := db.file.Truncate(int64(sz)); err != nil {
return fmt.Errorf("truncate: %s", err)
if !db.readOnly {
// Truncate the database to the size of the mmap.
if err := db.file.Truncate(int64(sz)); err != nil {
return fmt.Errorf("truncate: %s", err)
}
}
// Open a file mapping handle.

View File

@ -2,8 +2,6 @@
package bolt
var odirect int
// fdatasync flushes written data to a file descriptor.
func fdatasync(db *DB) error {
return db.file.Sync()

View File

@ -11,7 +11,7 @@ const (
MaxKeySize = 32768
// MaxValueSize is the maximum length of a value, in bytes.
MaxValueSize = 4294967295
MaxValueSize = (1 << 31) - 2
)
const (
@ -99,6 +99,7 @@ func (b *Bucket) Cursor() *Cursor {
// Bucket retrieves a nested bucket by name.
// Returns nil if the bucket does not exist.
// The bucket instance is only valid for the lifetime of the transaction.
func (b *Bucket) Bucket(name []byte) *Bucket {
if b.buckets != nil {
if child := b.buckets[string(name)]; child != nil {
@ -148,6 +149,7 @@ func (b *Bucket) openBucket(value []byte) *Bucket {
// CreateBucket creates a new bucket at the given key and returns the new bucket.
// Returns an error if the key already exists, if the bucket name is blank, or if the bucket name is too long.
// The bucket instance is only valid for the lifetime of the transaction.
func (b *Bucket) CreateBucket(key []byte) (*Bucket, error) {
if b.tx.db == nil {
return nil, ErrTxClosed
@ -192,6 +194,7 @@ func (b *Bucket) CreateBucket(key []byte) (*Bucket, error) {
// CreateBucketIfNotExists creates a new bucket if it doesn't already exist and returns a reference to it.
// Returns an error if the bucket name is blank, or if the bucket name is too long.
// The bucket instance is only valid for the lifetime of the transaction.
func (b *Bucket) CreateBucketIfNotExists(key []byte) (*Bucket, error) {
child, err := b.CreateBucket(key)
if err == ErrBucketExists {
@ -270,6 +273,7 @@ func (b *Bucket) Get(key []byte) []byte {
// Put sets the value for a key in the bucket.
// If the key exist then its previous value will be overwritten.
// Supplied value must remain valid for the life of the transaction.
// Returns an error if the bucket was created from a read-only transaction, if the key is blank, if the key is too large, or if the value is too large.
func (b *Bucket) Put(key []byte, value []byte) error {
if b.tx.db == nil {
@ -346,7 +350,8 @@ func (b *Bucket) NextSequence() (uint64, error) {
// ForEach executes a function for each key/value pair in a bucket.
// If the provided function returns an error then the iteration is stopped and
// the error is returned to the caller.
// the error is returned to the caller. The provided function must not modify
// the bucket; this will result in undefined behavior.
func (b *Bucket) ForEach(fn func(k, v []byte) error) error {
if b.tx.db == nil {
return ErrTxClosed

File diff suppressed because it is too large Load Diff

View File

@ -344,7 +344,7 @@ func (cmd *DumpCommand) Run(args ...string) error {
for i, pageID := range pageIDs {
// Print a separator.
if i > 0 {
fmt.Fprintln(cmd.Stdout, "===============================================\n")
fmt.Fprintln(cmd.Stdout, "===============================================")
}
// Print page to stdout.
@ -465,7 +465,7 @@ func (cmd *PageCommand) Run(args ...string) error {
for i, pageID := range pageIDs {
// Print a separator.
if i > 0 {
fmt.Fprintln(cmd.Stdout, "===============================================\n")
fmt.Fprintln(cmd.Stdout, "===============================================")
}
// Retrieve page info and page size.
@ -825,7 +825,10 @@ func (cmd *StatsCommand) Run(args ...string) error {
fmt.Fprintln(cmd.Stdout, "Bucket statistics")
fmt.Fprintf(cmd.Stdout, "\tTotal number of buckets: %d\n", s.BucketN)
percentage = int(float32(s.InlineBucketN) * 100.0 / float32(s.BucketN))
percentage = 0
if s.BucketN != 0 {
percentage = int(float32(s.InlineBucketN) * 100.0 / float32(s.BucketN))
}
fmt.Fprintf(cmd.Stdout, "\tTotal number on inlined buckets: %d (%d%%)\n", s.InlineBucketN, percentage)
percentage = 0
if s.LeafInuse != 0 {
@ -917,7 +920,7 @@ func (cmd *BenchCommand) Run(args ...string) error {
// Write to the database.
var results BenchResults
if err := cmd.runWrites(db, options, &results); err != nil {
return fmt.Errorf("write: ", err)
return fmt.Errorf("write: %v", err)
}
// Read from the database.

View File

@ -1,145 +0,0 @@
package main_test
import (
"bytes"
"io/ioutil"
"os"
"strconv"
"testing"
"github.com/coreos/etcd/Godeps/_workspace/src/github.com/boltdb/bolt"
"github.com/coreos/etcd/Godeps/_workspace/src/github.com/boltdb/bolt/cmd/bolt"
)
// Ensure the "info" command can print information about a database.
func TestInfoCommand_Run(t *testing.T) {
db := MustOpen(0666, nil)
db.DB.Close()
defer db.Close()
// Run the info command.
m := NewMain()
if err := m.Run("info", db.Path); err != nil {
t.Fatal(err)
}
}
// Ensure the "stats" command can execute correctly.
func TestStatsCommand_Run(t *testing.T) {
// Ignore
if os.Getpagesize() != 4096 {
t.Skip("system does not use 4KB page size")
}
db := MustOpen(0666, nil)
defer db.Close()
if err := db.Update(func(tx *bolt.Tx) error {
// Create "foo" bucket.
b, err := tx.CreateBucket([]byte("foo"))
if err != nil {
return err
}
for i := 0; i < 10; i++ {
if err := b.Put([]byte(strconv.Itoa(i)), []byte(strconv.Itoa(i))); err != nil {
return err
}
}
// Create "bar" bucket.
b, err = tx.CreateBucket([]byte("bar"))
if err != nil {
return err
}
for i := 0; i < 100; i++ {
if err := b.Put([]byte(strconv.Itoa(i)), []byte(strconv.Itoa(i))); err != nil {
return err
}
}
// Create "baz" bucket.
b, err = tx.CreateBucket([]byte("baz"))
if err != nil {
return err
}
if err := b.Put([]byte("key"), []byte("value")); err != nil {
return err
}
return nil
}); err != nil {
t.Fatal(err)
}
db.DB.Close()
// Generate expected result.
exp := "Aggregate statistics for 3 buckets\n\n" +
"Page count statistics\n" +
"\tNumber of logical branch pages: 0\n" +
"\tNumber of physical branch overflow pages: 0\n" +
"\tNumber of logical leaf pages: 1\n" +
"\tNumber of physical leaf overflow pages: 0\n" +
"Tree statistics\n" +
"\tNumber of keys/value pairs: 111\n" +
"\tNumber of levels in B+tree: 1\n" +
"Page size utilization\n" +
"\tBytes allocated for physical branch pages: 0\n" +
"\tBytes actually used for branch data: 0 (0%)\n" +
"\tBytes allocated for physical leaf pages: 4096\n" +
"\tBytes actually used for leaf data: 1996 (48%)\n" +
"Bucket statistics\n" +
"\tTotal number of buckets: 3\n" +
"\tTotal number on inlined buckets: 2 (66%)\n" +
"\tBytes used for inlined buckets: 236 (11%)\n"
// Run the command.
m := NewMain()
if err := m.Run("stats", db.Path); err != nil {
t.Fatal(err)
} else if m.Stdout.String() != exp {
t.Fatalf("unexpected stdout:\n\n%s", m.Stdout.String())
}
}
// Main represents a test wrapper for main.Main that records output.
type Main struct {
*main.Main
Stdin bytes.Buffer
Stdout bytes.Buffer
Stderr bytes.Buffer
}
// NewMain returns a new instance of Main.
func NewMain() *Main {
m := &Main{Main: main.NewMain()}
m.Main.Stdin = &m.Stdin
m.Main.Stdout = &m.Stdout
m.Main.Stderr = &m.Stderr
return m
}
// MustOpen creates a Bolt database in a temporary location.
func MustOpen(mode os.FileMode, options *bolt.Options) *DB {
// Create temporary path.
f, _ := ioutil.TempFile("", "bolt-")
f.Close()
os.Remove(f.Name())
db, err := bolt.Open(f.Name(), mode, options)
if err != nil {
panic(err.Error())
}
return &DB{DB: db, Path: f.Name()}
}
// DB is a test wrapper for bolt.DB.
type DB struct {
*bolt.DB
Path string
}
// Close closes and removes the database.
func (db *DB) Close() error {
defer os.Remove(db.Path)
return db.DB.Close()
}

View File

@ -34,6 +34,13 @@ func (c *Cursor) First() (key []byte, value []byte) {
p, n := c.bucket.pageNode(c.bucket.root)
c.stack = append(c.stack, elemRef{page: p, node: n, index: 0})
c.first()
// If we land on an empty page then move to the next value.
// https://github.com/boltdb/bolt/issues/450
if c.stack[len(c.stack)-1].count() == 0 {
c.next()
}
k, v, flags := c.keyValue()
if (flags & uint32(bucketLeafFlag)) != 0 {
return k, nil
@ -209,28 +216,37 @@ func (c *Cursor) last() {
// next moves to the next leaf element and returns the key and value.
// If the cursor is at the last leaf element then it stays there and returns nil.
func (c *Cursor) next() (key []byte, value []byte, flags uint32) {
// Attempt to move over one element until we're successful.
// Move up the stack as we hit the end of each page in our stack.
var i int
for i = len(c.stack) - 1; i >= 0; i-- {
elem := &c.stack[i]
if elem.index < elem.count()-1 {
elem.index++
break
for {
// Attempt to move over one element until we're successful.
// Move up the stack as we hit the end of each page in our stack.
var i int
for i = len(c.stack) - 1; i >= 0; i-- {
elem := &c.stack[i]
if elem.index < elem.count()-1 {
elem.index++
break
}
}
}
// If we've hit the root page then stop and return. This will leave the
// cursor on the last element of the last page.
if i == -1 {
return nil, nil, 0
}
// If we've hit the root page then stop and return. This will leave the
// cursor on the last element of the last page.
if i == -1 {
return nil, nil, 0
}
// Otherwise start from where we left off in the stack and find the
// first element of the first leaf page.
c.stack = c.stack[:i+1]
c.first()
return c.keyValue()
// Otherwise start from where we left off in the stack and find the
// first element of the first leaf page.
c.stack = c.stack[:i+1]
c.first()
// If this is an empty page then restart and move back up the stack.
// https://github.com/boltdb/bolt/issues/450
if c.stack[len(c.stack)-1].count() == 0 {
continue
}
return c.keyValue()
}
}
// search recursively performs a binary search against a given page/node until it finds a given key.

View File

@ -1,511 +0,0 @@
package bolt_test
import (
"bytes"
"encoding/binary"
"fmt"
"os"
"sort"
"testing"
"testing/quick"
"github.com/coreos/etcd/Godeps/_workspace/src/github.com/boltdb/bolt"
)
// Ensure that a cursor can return a reference to the bucket that created it.
func TestCursor_Bucket(t *testing.T) {
db := NewTestDB()
defer db.Close()
db.Update(func(tx *bolt.Tx) error {
b, _ := tx.CreateBucket([]byte("widgets"))
c := b.Cursor()
equals(t, b, c.Bucket())
return nil
})
}
// Ensure that a Tx cursor can seek to the appropriate keys.
func TestCursor_Seek(t *testing.T) {
db := NewTestDB()
defer db.Close()
db.Update(func(tx *bolt.Tx) error {
b, err := tx.CreateBucket([]byte("widgets"))
ok(t, err)
ok(t, b.Put([]byte("foo"), []byte("0001")))
ok(t, b.Put([]byte("bar"), []byte("0002")))
ok(t, b.Put([]byte("baz"), []byte("0003")))
_, err = b.CreateBucket([]byte("bkt"))
ok(t, err)
return nil
})
db.View(func(tx *bolt.Tx) error {
c := tx.Bucket([]byte("widgets")).Cursor()
// Exact match should go to the key.
k, v := c.Seek([]byte("bar"))
equals(t, []byte("bar"), k)
equals(t, []byte("0002"), v)
// Inexact match should go to the next key.
k, v = c.Seek([]byte("bas"))
equals(t, []byte("baz"), k)
equals(t, []byte("0003"), v)
// Low key should go to the first key.
k, v = c.Seek([]byte(""))
equals(t, []byte("bar"), k)
equals(t, []byte("0002"), v)
// High key should return no key.
k, v = c.Seek([]byte("zzz"))
assert(t, k == nil, "")
assert(t, v == nil, "")
// Buckets should return their key but no value.
k, v = c.Seek([]byte("bkt"))
equals(t, []byte("bkt"), k)
assert(t, v == nil, "")
return nil
})
}
func TestCursor_Delete(t *testing.T) {
db := NewTestDB()
defer db.Close()
var count = 1000
// Insert every other key between 0 and $count.
db.Update(func(tx *bolt.Tx) error {
b, _ := tx.CreateBucket([]byte("widgets"))
for i := 0; i < count; i += 1 {
k := make([]byte, 8)
binary.BigEndian.PutUint64(k, uint64(i))
b.Put(k, make([]byte, 100))
}
b.CreateBucket([]byte("sub"))
return nil
})
db.Update(func(tx *bolt.Tx) error {
c := tx.Bucket([]byte("widgets")).Cursor()
bound := make([]byte, 8)
binary.BigEndian.PutUint64(bound, uint64(count/2))
for key, _ := c.First(); bytes.Compare(key, bound) < 0; key, _ = c.Next() {
if err := c.Delete(); err != nil {
return err
}
}
c.Seek([]byte("sub"))
err := c.Delete()
equals(t, err, bolt.ErrIncompatibleValue)
return nil
})
db.View(func(tx *bolt.Tx) error {
b := tx.Bucket([]byte("widgets"))
equals(t, b.Stats().KeyN, count/2+1)
return nil
})
}
// Ensure that a Tx cursor can seek to the appropriate keys when there are a
// large number of keys. This test also checks that seek will always move
// forward to the next key.
//
// Related: https://github.com/boltdb/bolt/pull/187
func TestCursor_Seek_Large(t *testing.T) {
db := NewTestDB()
defer db.Close()
var count = 10000
// Insert every other key between 0 and $count.
db.Update(func(tx *bolt.Tx) error {
b, _ := tx.CreateBucket([]byte("widgets"))
for i := 0; i < count; i += 100 {
for j := i; j < i+100; j += 2 {
k := make([]byte, 8)
binary.BigEndian.PutUint64(k, uint64(j))
b.Put(k, make([]byte, 100))
}
}
return nil
})
db.View(func(tx *bolt.Tx) error {
c := tx.Bucket([]byte("widgets")).Cursor()
for i := 0; i < count; i++ {
seek := make([]byte, 8)
binary.BigEndian.PutUint64(seek, uint64(i))
k, _ := c.Seek(seek)
// The last seek is beyond the end of the the range so
// it should return nil.
if i == count-1 {
assert(t, k == nil, "")
continue
}
// Otherwise we should seek to the exact key or the next key.
num := binary.BigEndian.Uint64(k)
if i%2 == 0 {
equals(t, uint64(i), num)
} else {
equals(t, uint64(i+1), num)
}
}
return nil
})
}
// Ensure that a cursor can iterate over an empty bucket without error.
func TestCursor_EmptyBucket(t *testing.T) {
db := NewTestDB()
defer db.Close()
db.Update(func(tx *bolt.Tx) error {
_, err := tx.CreateBucket([]byte("widgets"))
return err
})
db.View(func(tx *bolt.Tx) error {
c := tx.Bucket([]byte("widgets")).Cursor()
k, v := c.First()
assert(t, k == nil, "")
assert(t, v == nil, "")
return nil
})
}
// Ensure that a Tx cursor can reverse iterate over an empty bucket without error.
func TestCursor_EmptyBucketReverse(t *testing.T) {
db := NewTestDB()
defer db.Close()
db.Update(func(tx *bolt.Tx) error {
_, err := tx.CreateBucket([]byte("widgets"))
return err
})
db.View(func(tx *bolt.Tx) error {
c := tx.Bucket([]byte("widgets")).Cursor()
k, v := c.Last()
assert(t, k == nil, "")
assert(t, v == nil, "")
return nil
})
}
// Ensure that a Tx cursor can iterate over a single root with a couple elements.
func TestCursor_Iterate_Leaf(t *testing.T) {
db := NewTestDB()
defer db.Close()
db.Update(func(tx *bolt.Tx) error {
tx.CreateBucket([]byte("widgets"))
tx.Bucket([]byte("widgets")).Put([]byte("baz"), []byte{})
tx.Bucket([]byte("widgets")).Put([]byte("foo"), []byte{0})
tx.Bucket([]byte("widgets")).Put([]byte("bar"), []byte{1})
return nil
})
tx, _ := db.Begin(false)
c := tx.Bucket([]byte("widgets")).Cursor()
k, v := c.First()
equals(t, string(k), "bar")
equals(t, v, []byte{1})
k, v = c.Next()
equals(t, string(k), "baz")
equals(t, v, []byte{})
k, v = c.Next()
equals(t, string(k), "foo")
equals(t, v, []byte{0})
k, v = c.Next()
assert(t, k == nil, "")
assert(t, v == nil, "")
k, v = c.Next()
assert(t, k == nil, "")
assert(t, v == nil, "")
tx.Rollback()
}
// Ensure that a Tx cursor can iterate in reverse over a single root with a couple elements.
func TestCursor_LeafRootReverse(t *testing.T) {
db := NewTestDB()
defer db.Close()
db.Update(func(tx *bolt.Tx) error {
tx.CreateBucket([]byte("widgets"))
tx.Bucket([]byte("widgets")).Put([]byte("baz"), []byte{})
tx.Bucket([]byte("widgets")).Put([]byte("foo"), []byte{0})
tx.Bucket([]byte("widgets")).Put([]byte("bar"), []byte{1})
return nil
})
tx, _ := db.Begin(false)
c := tx.Bucket([]byte("widgets")).Cursor()
k, v := c.Last()
equals(t, string(k), "foo")
equals(t, v, []byte{0})
k, v = c.Prev()
equals(t, string(k), "baz")
equals(t, v, []byte{})
k, v = c.Prev()
equals(t, string(k), "bar")
equals(t, v, []byte{1})
k, v = c.Prev()
assert(t, k == nil, "")
assert(t, v == nil, "")
k, v = c.Prev()
assert(t, k == nil, "")
assert(t, v == nil, "")
tx.Rollback()
}
// Ensure that a Tx cursor can restart from the beginning.
func TestCursor_Restart(t *testing.T) {
db := NewTestDB()
defer db.Close()
db.Update(func(tx *bolt.Tx) error {
tx.CreateBucket([]byte("widgets"))
tx.Bucket([]byte("widgets")).Put([]byte("bar"), []byte{})
tx.Bucket([]byte("widgets")).Put([]byte("foo"), []byte{})
return nil
})
tx, _ := db.Begin(false)
c := tx.Bucket([]byte("widgets")).Cursor()
k, _ := c.First()
equals(t, string(k), "bar")
k, _ = c.Next()
equals(t, string(k), "foo")
k, _ = c.First()
equals(t, string(k), "bar")
k, _ = c.Next()
equals(t, string(k), "foo")
tx.Rollback()
}
// Ensure that a Tx can iterate over all elements in a bucket.
func TestCursor_QuickCheck(t *testing.T) {
f := func(items testdata) bool {
db := NewTestDB()
defer db.Close()
// Bulk insert all values.
tx, _ := db.Begin(true)
tx.CreateBucket([]byte("widgets"))
b := tx.Bucket([]byte("widgets"))
for _, item := range items {
ok(t, b.Put(item.Key, item.Value))
}
ok(t, tx.Commit())
// Sort test data.
sort.Sort(items)
// Iterate over all items and check consistency.
var index = 0
tx, _ = db.Begin(false)
c := tx.Bucket([]byte("widgets")).Cursor()
for k, v := c.First(); k != nil && index < len(items); k, v = c.Next() {
equals(t, k, items[index].Key)
equals(t, v, items[index].Value)
index++
}
equals(t, len(items), index)
tx.Rollback()
return true
}
if err := quick.Check(f, qconfig()); err != nil {
t.Error(err)
}
}
// Ensure that a transaction can iterate over all elements in a bucket in reverse.
func TestCursor_QuickCheck_Reverse(t *testing.T) {
f := func(items testdata) bool {
db := NewTestDB()
defer db.Close()
// Bulk insert all values.
tx, _ := db.Begin(true)
tx.CreateBucket([]byte("widgets"))
b := tx.Bucket([]byte("widgets"))
for _, item := range items {
ok(t, b.Put(item.Key, item.Value))
}
ok(t, tx.Commit())
// Sort test data.
sort.Sort(revtestdata(items))
// Iterate over all items and check consistency.
var index = 0
tx, _ = db.Begin(false)
c := tx.Bucket([]byte("widgets")).Cursor()
for k, v := c.Last(); k != nil && index < len(items); k, v = c.Prev() {
equals(t, k, items[index].Key)
equals(t, v, items[index].Value)
index++
}
equals(t, len(items), index)
tx.Rollback()
return true
}
if err := quick.Check(f, qconfig()); err != nil {
t.Error(err)
}
}
// Ensure that a Tx cursor can iterate over subbuckets.
func TestCursor_QuickCheck_BucketsOnly(t *testing.T) {
db := NewTestDB()
defer db.Close()
db.Update(func(tx *bolt.Tx) error {
b, err := tx.CreateBucket([]byte("widgets"))
ok(t, err)
_, err = b.CreateBucket([]byte("foo"))
ok(t, err)
_, err = b.CreateBucket([]byte("bar"))
ok(t, err)
_, err = b.CreateBucket([]byte("baz"))
ok(t, err)
return nil
})
db.View(func(tx *bolt.Tx) error {
var names []string
c := tx.Bucket([]byte("widgets")).Cursor()
for k, v := c.First(); k != nil; k, v = c.Next() {
names = append(names, string(k))
assert(t, v == nil, "")
}
equals(t, names, []string{"bar", "baz", "foo"})
return nil
})
}
// Ensure that a Tx cursor can reverse iterate over subbuckets.
func TestCursor_QuickCheck_BucketsOnly_Reverse(t *testing.T) {
db := NewTestDB()
defer db.Close()
db.Update(func(tx *bolt.Tx) error {
b, err := tx.CreateBucket([]byte("widgets"))
ok(t, err)
_, err = b.CreateBucket([]byte("foo"))
ok(t, err)
_, err = b.CreateBucket([]byte("bar"))
ok(t, err)
_, err = b.CreateBucket([]byte("baz"))
ok(t, err)
return nil
})
db.View(func(tx *bolt.Tx) error {
var names []string
c := tx.Bucket([]byte("widgets")).Cursor()
for k, v := c.Last(); k != nil; k, v = c.Prev() {
names = append(names, string(k))
assert(t, v == nil, "")
}
equals(t, names, []string{"foo", "baz", "bar"})
return nil
})
}
func ExampleCursor() {
// Open the database.
db, _ := bolt.Open(tempfile(), 0666, nil)
defer os.Remove(db.Path())
defer db.Close()
// Start a read-write transaction.
db.Update(func(tx *bolt.Tx) error {
// Create a new bucket.
tx.CreateBucket([]byte("animals"))
// Insert data into a bucket.
b := tx.Bucket([]byte("animals"))
b.Put([]byte("dog"), []byte("fun"))
b.Put([]byte("cat"), []byte("lame"))
b.Put([]byte("liger"), []byte("awesome"))
// Create a cursor for iteration.
c := b.Cursor()
// Iterate over items in sorted key order. This starts from the
// first key/value pair and updates the k/v variables to the
// next key/value on each iteration.
//
// The loop finishes at the end of the cursor when a nil key is returned.
for k, v := c.First(); k != nil; k, v = c.Next() {
fmt.Printf("A %s is %s.\n", k, v)
}
return nil
})
// Output:
// A cat is lame.
// A dog is fun.
// A liger is awesome.
}
func ExampleCursor_reverse() {
// Open the database.
db, _ := bolt.Open(tempfile(), 0666, nil)
defer os.Remove(db.Path())
defer db.Close()
// Start a read-write transaction.
db.Update(func(tx *bolt.Tx) error {
// Create a new bucket.
tx.CreateBucket([]byte("animals"))
// Insert data into a bucket.
b := tx.Bucket([]byte("animals"))
b.Put([]byte("dog"), []byte("fun"))
b.Put([]byte("cat"), []byte("lame"))
b.Put([]byte("liger"), []byte("awesome"))
// Create a cursor for iteration.
c := b.Cursor()
// Iterate over items in reverse sorted key order. This starts
// from the last key/value pair and updates the k/v variables to
// the previous key/value on each iteration.
//
// The loop finishes at the beginning of the cursor when a nil key
// is returned.
for k, v := c.Last(); k != nil; k, v = c.Prev() {
fmt.Printf("A %s is %s.\n", k, v)
}
return nil
})
// Output:
// A liger is awesome.
// A dog is fun.
// A cat is lame.
}

View File

@ -1,8 +1,10 @@
package bolt
import (
"errors"
"fmt"
"hash/fnv"
"log"
"os"
"runtime"
"runtime/debug"
@ -24,13 +26,14 @@ const magic uint32 = 0xED0CDAED
// IgnoreNoSync specifies whether the NoSync field of a DB is ignored when
// syncing changes to a file. This is required as some operating systems,
// such as OpenBSD, do not have a unified buffer cache (UBC) and writes
// must be synchronzied using the msync(2) syscall.
// must be synchronized using the msync(2) syscall.
const IgnoreNoSync = runtime.GOOS == "openbsd"
// Default values if not set in a DB instance.
const (
DefaultMaxBatchSize int = 1000
DefaultMaxBatchDelay = 10 * time.Millisecond
DefaultAllocSize = 16 * 1024 * 1024
)
// DB represents a collection of buckets persisted to a file on disk.
@ -55,6 +58,18 @@ type DB struct {
// THIS IS UNSAFE. PLEASE USE WITH CAUTION.
NoSync bool
// When true, skips the truncate call when growing the database.
// Setting this to true is only safe on non-ext3/ext4 systems.
// Skipping truncation avoids preallocation of hard drive space and
// bypasses a truncate() and fsync() syscall on remapping.
//
// https://github.com/boltdb/bolt/issues/284
NoGrowSync bool
// If you want to read the entire database fast, you can set MmapFlag to
// syscall.MAP_POPULATE on Linux 2.6.23+ for sequential read-ahead.
MmapFlags int
// MaxBatchSize is the maximum size of a batch. Default value is
// copied from DefaultMaxBatchSize in Open.
//
@ -71,11 +86,18 @@ type DB struct {
// Do not change concurrently with calls to Batch.
MaxBatchDelay time.Duration
// AllocSize is the amount of space allocated when the database
// needs to create new pages. This is done to amortize the cost
// of truncate() and fsync() when growing the data file.
AllocSize int
path string
file *os.File
lockfile *os.File // windows only
dataref []byte // mmap'ed readonly, write throws SEGV
data *[maxMapSize]byte
datasz int
filesz int // current on disk file size
meta0 *meta
meta1 *meta
pageSize int
@ -96,6 +118,10 @@ type DB struct {
ops struct {
writeAt func(b []byte, off int64) (n int, err error)
}
// Read only mode.
// When true, Update() and Begin(true) return ErrDatabaseReadOnly immediately.
readOnly bool
}
// Path returns the path to currently open database file.
@ -123,24 +149,36 @@ func Open(path string, mode os.FileMode, options *Options) (*DB, error) {
if options == nil {
options = DefaultOptions
}
db.NoGrowSync = options.NoGrowSync
db.MmapFlags = options.MmapFlags
// Set default values for later DB operations.
db.MaxBatchSize = DefaultMaxBatchSize
db.MaxBatchDelay = DefaultMaxBatchDelay
db.AllocSize = DefaultAllocSize
flag := os.O_RDWR
if options.ReadOnly {
flag = os.O_RDONLY
db.readOnly = true
}
// Open data file and separate sync handler for metadata writes.
db.path = path
var err error
if db.file, err = os.OpenFile(db.path, os.O_RDWR|os.O_CREATE, mode); err != nil {
if db.file, err = os.OpenFile(db.path, flag|os.O_CREATE, mode); err != nil {
_ = db.close()
return nil, err
}
// Lock file so that other processes using Bolt cannot use the database
// at the same time. This would cause corruption since the two processes
// would write meta pages and free pages separately.
if err := flock(db.file, options.Timeout); err != nil {
// Lock file so that other processes using Bolt in read-write mode cannot
// use the database at the same time. This would cause corruption since
// the two processes would write meta pages and free pages separately.
// The database file is locked exclusively (only one process can grab the lock)
// if !options.ReadOnly.
// The database file is locked using the shared lock (more than one process may
// hold a lock at the same time) otherwise (options.ReadOnly is set).
if err := flock(db, mode, !db.readOnly, options.Timeout); err != nil {
_ = db.close()
return nil, err
}
@ -150,7 +188,7 @@ func Open(path string, mode os.FileMode, options *Options) (*DB, error) {
// Initialize the database if it doesn't exist.
if info, err := db.file.Stat(); err != nil {
return nil, fmt.Errorf("stat error: %s", err)
return nil, err
} else if info.Size() == 0 {
// Initialize new files with meta pages.
if err := db.init(); err != nil {
@ -162,14 +200,14 @@ func Open(path string, mode os.FileMode, options *Options) (*DB, error) {
if _, err := db.file.ReadAt(buf[:], 0); err == nil {
m := db.pageInBuffer(buf[:], 0).meta()
if err := m.validate(); err != nil {
return nil, fmt.Errorf("meta0 error: %s", err)
return nil, err
}
db.pageSize = int(m.pageSize)
}
}
// Memory map the data file.
if err := db.mmap(0); err != nil {
if err := db.mmap(options.InitialMmapSize); err != nil {
_ = db.close()
return nil, err
}
@ -226,10 +264,10 @@ func (db *DB) mmap(minsz int) error {
// Validate the meta pages.
if err := db.meta0.validate(); err != nil {
return fmt.Errorf("meta0 error: %s", err)
return err
}
if err := db.meta1.validate(); err != nil {
return fmt.Errorf("meta1 error: %s", err)
return err
}
return nil
@ -244,11 +282,11 @@ func (db *DB) munmap() error {
}
// mmapSize determines the appropriate size for the mmap given the current size
// of the database. The minimum size is 1MB and doubles until it reaches 1GB.
// of the database. The minimum size is 32KB and doubles until it reaches 1GB.
// Returns an error if the new mmap size is greater than the max allowed.
func (db *DB) mmapSize(size int) (int, error) {
// Double the size from 1MB until 1GB.
for i := uint(20); i <= 30; i++ {
// Double the size from 32KB until 1GB.
for i := uint(15); i <= 30; i++ {
if size <= 1<<i {
return 1 << i, nil
}
@ -329,12 +367,23 @@ func (db *DB) init() error {
// Close releases all database resources.
// All transactions must be closed before closing the database.
func (db *DB) Close() error {
db.rwlock.Lock()
defer db.rwlock.Unlock()
db.metalock.Lock()
defer db.metalock.Unlock()
db.mmaplock.RLock()
defer db.mmaplock.RUnlock()
return db.close()
}
func (db *DB) close() error {
if !db.opened {
return nil
}
db.opened = false
db.freelist = nil
@ -350,8 +399,13 @@ func (db *DB) close() error {
// Close file handles.
if db.file != nil {
// Unlock the file.
_ = funlock(db.file)
// No need to unlock read-only file.
if !db.readOnly {
// Unlock the file.
if err := funlock(db); err != nil {
log.Printf("bolt.Close(): funlock error: %s", err)
}
}
// Close the file descriptor.
if err := db.file.Close(); err != nil {
@ -369,6 +423,15 @@ func (db *DB) close() error {
// will cause the calls to block and be serialized until the current write
// transaction finishes.
//
// Transactions should not be dependent on one another. Opening a read
// transaction and a write transaction in the same goroutine can cause the
// writer to deadlock because the database periodically needs to re-mmap itself
// as it grows and it cannot do that while a read transaction is open.
//
// If a long running read transaction (for example, a snapshot transaction) is
// needed, you might want to set DB.InitialMmapSize to a large enough value
// to avoid potential blocking of write transaction.
//
// IMPORTANT: You must close read-only transactions after you are finished or
// else the database will not reclaim old pages.
func (db *DB) Begin(writable bool) (*Tx, error) {
@ -417,6 +480,11 @@ func (db *DB) beginTx() (*Tx, error) {
}
func (db *DB) beginRWTx() (*Tx, error) {
// If the database was opened with Options.ReadOnly, return an error.
if db.readOnly {
return nil, ErrDatabaseReadOnly
}
// Obtain writer lock. This is released by the transaction when it closes.
// This enforces only one writer transaction at a time.
db.rwlock.Lock()
@ -547,6 +615,142 @@ func (db *DB) View(fn func(*Tx) error) error {
return nil
}
// Batch calls fn as part of a batch. It behaves similar to Update,
// except:
//
// 1. concurrent Batch calls can be combined into a single Bolt
// transaction.
//
// 2. the function passed to Batch may be called multiple times,
// regardless of whether it returns error or not.
//
// This means that Batch function side effects must be idempotent and
// take permanent effect only after a successful return is seen in
// caller.
//
// The maximum batch size and delay can be adjusted with DB.MaxBatchSize
// and DB.MaxBatchDelay, respectively.
//
// Batch is only useful when there are multiple goroutines calling it.
func (db *DB) Batch(fn func(*Tx) error) error {
errCh := make(chan error, 1)
db.batchMu.Lock()
if (db.batch == nil) || (db.batch != nil && len(db.batch.calls) >= db.MaxBatchSize) {
// There is no existing batch, or the existing batch is full; start a new one.
db.batch = &batch{
db: db,
}
db.batch.timer = time.AfterFunc(db.MaxBatchDelay, db.batch.trigger)
}
db.batch.calls = append(db.batch.calls, call{fn: fn, err: errCh})
if len(db.batch.calls) >= db.MaxBatchSize {
// wake up batch, it's ready to run
go db.batch.trigger()
}
db.batchMu.Unlock()
err := <-errCh
if err == trySolo {
err = db.Update(fn)
}
return err
}
type call struct {
fn func(*Tx) error
err chan<- error
}
type batch struct {
db *DB
timer *time.Timer
start sync.Once
calls []call
}
// trigger runs the batch if it hasn't already been run.
func (b *batch) trigger() {
b.start.Do(b.run)
}
// run performs the transactions in the batch and communicates results
// back to DB.Batch.
func (b *batch) run() {
b.db.batchMu.Lock()
b.timer.Stop()
// Make sure no new work is added to this batch, but don't break
// other batches.
if b.db.batch == b {
b.db.batch = nil
}
b.db.batchMu.Unlock()
retry:
for len(b.calls) > 0 {
var failIdx = -1
err := b.db.Update(func(tx *Tx) error {
for i, c := range b.calls {
if err := safelyCall(c.fn, tx); err != nil {
failIdx = i
return err
}
}
return nil
})
if failIdx >= 0 {
// take the failing transaction out of the batch. it's
// safe to shorten b.calls here because db.batch no longer
// points to us, and we hold the mutex anyway.
c := b.calls[failIdx]
b.calls[failIdx], b.calls = b.calls[len(b.calls)-1], b.calls[:len(b.calls)-1]
// tell the submitter re-run it solo, continue with the rest of the batch
c.err <- trySolo
continue retry
}
// pass success, or bolt internal errors, to all callers
for _, c := range b.calls {
if c.err != nil {
c.err <- err
}
}
break retry
}
}
// trySolo is a special sentinel error value used for signaling that a
// transaction function should be re-run. It should never be seen by
// callers.
var trySolo = errors.New("batch function returned an error and should be re-run solo")
type panicked struct {
reason interface{}
}
func (p panicked) Error() string {
if err, ok := p.reason.(error); ok {
return err.Error()
}
return fmt.Sprintf("panic: %v", p.reason)
}
func safelyCall(fn func(*Tx) error, tx *Tx) (err error) {
defer func() {
if p := recover(); p != nil {
err = panicked{p}
}
}()
return fn(tx)
}
// Sync executes fdatasync() against the database file handle.
//
// This is not necessary under normal operation, however, if you use NoSync
// then it allows you to force the database file to sync against the disk.
func (db *DB) Sync() error { return fdatasync(db) }
// Stats retrieves ongoing performance stats for the database.
// This is only updated when a transaction closes.
func (db *DB) Stats() Stats {
@ -607,18 +811,75 @@ func (db *DB) allocate(count int) (*page, error) {
return p, nil
}
// grow grows the size of the database to the given sz.
func (db *DB) grow(sz int) error {
// Ignore if the new size is less than available file size.
if sz <= db.filesz {
return nil
}
// If the data is smaller than the alloc size then only allocate what's needed.
// Once it goes over the allocation size then allocate in chunks.
if db.datasz < db.AllocSize {
sz = db.datasz
} else {
sz += db.AllocSize
}
// Truncate and fsync to ensure file size metadata is flushed.
// https://github.com/boltdb/bolt/issues/284
if !db.NoGrowSync && !db.readOnly {
if runtime.GOOS != "windows" {
if err := db.file.Truncate(int64(sz)); err != nil {
return fmt.Errorf("file resize error: %s", err)
}
}
if err := db.file.Sync(); err != nil {
return fmt.Errorf("file sync error: %s", err)
}
}
db.filesz = sz
return nil
}
func (db *DB) IsReadOnly() bool {
return db.readOnly
}
// Options represents the options that can be set when opening a database.
type Options struct {
// Timeout is the amount of time to wait to obtain a file lock.
// When set to zero it will wait indefinitely. This option is only
// available on Darwin and Linux.
Timeout time.Duration
// Sets the DB.NoGrowSync flag before memory mapping the file.
NoGrowSync bool
// Open database in read-only mode. Uses flock(..., LOCK_SH |LOCK_NB) to
// grab a shared lock (UNIX).
ReadOnly bool
// Sets the DB.MmapFlags flag before memory mapping the file.
MmapFlags int
// InitialMmapSize is the initial mmap size of the database
// in bytes. Read transactions won't block write transaction
// if the InitialMmapSize is large enough to hold database mmap
// size. (See DB.Begin for more information)
//
// If <=0, the initial map size is 0.
// If initialMmapSize is smaller than the previous database size,
// it takes no effect.
InitialMmapSize int
}
// DefaultOptions represent the options used if nil options are passed into Open().
// No timeout is used which will cause Bolt to wait indefinitely for a lock.
var DefaultOptions = &Options{
Timeout: 0,
Timeout: 0,
NoGrowSync: false,
}
// Stats represents statistics about the database.

View File

@ -1,790 +0,0 @@
package bolt_test
import (
"encoding/binary"
"errors"
"flag"
"fmt"
"io/ioutil"
"os"
"regexp"
"runtime"
"sort"
"strings"
"testing"
"time"
"github.com/coreos/etcd/Godeps/_workspace/src/github.com/boltdb/bolt"
)
var statsFlag = flag.Bool("stats", false, "show performance stats")
// Ensure that opening a database with a bad path returns an error.
func TestOpen_BadPath(t *testing.T) {
db, err := bolt.Open("", 0666, nil)
assert(t, err != nil, "err: %s", err)
assert(t, db == nil, "")
}
// Ensure that a database can be opened without error.
func TestOpen(t *testing.T) {
path := tempfile()
defer os.Remove(path)
db, err := bolt.Open(path, 0666, nil)
assert(t, db != nil, "")
ok(t, err)
equals(t, db.Path(), path)
ok(t, db.Close())
}
// Ensure that opening an already open database file will timeout.
func TestOpen_Timeout(t *testing.T) {
if runtime.GOOS == "windows" {
t.Skip("timeout not supported on windows")
}
path := tempfile()
defer os.Remove(path)
// Open a data file.
db0, err := bolt.Open(path, 0666, nil)
assert(t, db0 != nil, "")
ok(t, err)
// Attempt to open the database again.
start := time.Now()
db1, err := bolt.Open(path, 0666, &bolt.Options{Timeout: 100 * time.Millisecond})
assert(t, db1 == nil, "")
equals(t, bolt.ErrTimeout, err)
assert(t, time.Since(start) > 100*time.Millisecond, "")
db0.Close()
}
// Ensure that opening an already open database file will wait until its closed.
func TestOpen_Wait(t *testing.T) {
if runtime.GOOS == "windows" {
t.Skip("timeout not supported on windows")
}
path := tempfile()
defer os.Remove(path)
// Open a data file.
db0, err := bolt.Open(path, 0666, nil)
assert(t, db0 != nil, "")
ok(t, err)
// Close it in just a bit.
time.AfterFunc(100*time.Millisecond, func() { db0.Close() })
// Attempt to open the database again.
start := time.Now()
db1, err := bolt.Open(path, 0666, &bolt.Options{Timeout: 200 * time.Millisecond})
assert(t, db1 != nil, "")
ok(t, err)
assert(t, time.Since(start) > 100*time.Millisecond, "")
}
// Ensure that opening a database does not increase its size.
// https://github.com/boltdb/bolt/issues/291
func TestOpen_Size(t *testing.T) {
// Open a data file.
db := NewTestDB()
path := db.Path()
defer db.Close()
// Insert until we get above the minimum 4MB size.
ok(t, db.Update(func(tx *bolt.Tx) error {
b, _ := tx.CreateBucketIfNotExists([]byte("data"))
for i := 0; i < 10000; i++ {
ok(t, b.Put([]byte(fmt.Sprintf("%04d", i)), make([]byte, 1000)))
}
return nil
}))
// Close database and grab the size.
db.DB.Close()
sz := fileSize(path)
if sz == 0 {
t.Fatalf("unexpected new file size: %d", sz)
}
// Reopen database, update, and check size again.
db0, err := bolt.Open(path, 0666, nil)
ok(t, err)
ok(t, db0.Update(func(tx *bolt.Tx) error { return tx.Bucket([]byte("data")).Put([]byte{0}, []byte{0}) }))
ok(t, db0.Close())
newSz := fileSize(path)
if newSz == 0 {
t.Fatalf("unexpected new file size: %d", newSz)
}
// Compare the original size with the new size.
if sz != newSz {
t.Fatalf("unexpected file growth: %d => %d", sz, newSz)
}
}
// Ensure that opening a database beyond the max step size does not increase its size.
// https://github.com/boltdb/bolt/issues/303
func TestOpen_Size_Large(t *testing.T) {
if testing.Short() {
t.Skip("short mode")
}
// Open a data file.
db := NewTestDB()
path := db.Path()
defer db.Close()
// Insert until we get above the minimum 4MB size.
var index uint64
for i := 0; i < 10000; i++ {
ok(t, db.Update(func(tx *bolt.Tx) error {
b, _ := tx.CreateBucketIfNotExists([]byte("data"))
for j := 0; j < 1000; j++ {
ok(t, b.Put(u64tob(index), make([]byte, 50)))
index++
}
return nil
}))
}
// Close database and grab the size.
db.DB.Close()
sz := fileSize(path)
if sz == 0 {
t.Fatalf("unexpected new file size: %d", sz)
} else if sz < (1 << 30) {
t.Fatalf("expected larger initial size: %d", sz)
}
// Reopen database, update, and check size again.
db0, err := bolt.Open(path, 0666, nil)
ok(t, err)
ok(t, db0.Update(func(tx *bolt.Tx) error { return tx.Bucket([]byte("data")).Put([]byte{0}, []byte{0}) }))
ok(t, db0.Close())
newSz := fileSize(path)
if newSz == 0 {
t.Fatalf("unexpected new file size: %d", newSz)
}
// Compare the original size with the new size.
if sz != newSz {
t.Fatalf("unexpected file growth: %d => %d", sz, newSz)
}
}
// Ensure that a re-opened database is consistent.
func TestOpen_Check(t *testing.T) {
path := tempfile()
defer os.Remove(path)
db, err := bolt.Open(path, 0666, nil)
ok(t, err)
ok(t, db.View(func(tx *bolt.Tx) error { return <-tx.Check() }))
db.Close()
db, err = bolt.Open(path, 0666, nil)
ok(t, err)
ok(t, db.View(func(tx *bolt.Tx) error { return <-tx.Check() }))
db.Close()
}
// Ensure that the database returns an error if the file handle cannot be open.
func TestDB_Open_FileError(t *testing.T) {
path := tempfile()
defer os.Remove(path)
_, err := bolt.Open(path+"/youre-not-my-real-parent", 0666, nil)
assert(t, err.(*os.PathError) != nil, "")
equals(t, path+"/youre-not-my-real-parent", err.(*os.PathError).Path)
equals(t, "open", err.(*os.PathError).Op)
}
// Ensure that write errors to the meta file handler during initialization are returned.
func TestDB_Open_MetaInitWriteError(t *testing.T) {
t.Skip("pending")
}
// Ensure that a database that is too small returns an error.
func TestDB_Open_FileTooSmall(t *testing.T) {
path := tempfile()
defer os.Remove(path)
db, err := bolt.Open(path, 0666, nil)
ok(t, err)
db.Close()
// corrupt the database
ok(t, os.Truncate(path, int64(os.Getpagesize())))
db, err = bolt.Open(path, 0666, nil)
equals(t, errors.New("file size too small"), err)
}
// TODO(benbjohnson): Test corruption at every byte of the first two pages.
// Ensure that a database cannot open a transaction when it's not open.
func TestDB_Begin_DatabaseNotOpen(t *testing.T) {
var db bolt.DB
tx, err := db.Begin(false)
assert(t, tx == nil, "")
equals(t, err, bolt.ErrDatabaseNotOpen)
}
// Ensure that a read-write transaction can be retrieved.
func TestDB_BeginRW(t *testing.T) {
db := NewTestDB()
defer db.Close()
tx, err := db.Begin(true)
assert(t, tx != nil, "")
ok(t, err)
assert(t, tx.DB() == db.DB, "")
equals(t, tx.Writable(), true)
ok(t, tx.Commit())
}
// Ensure that opening a transaction while the DB is closed returns an error.
func TestDB_BeginRW_Closed(t *testing.T) {
var db bolt.DB
tx, err := db.Begin(true)
equals(t, err, bolt.ErrDatabaseNotOpen)
assert(t, tx == nil, "")
}
// Ensure a database can provide a transactional block.
func TestDB_Update(t *testing.T) {
db := NewTestDB()
defer db.Close()
err := db.Update(func(tx *bolt.Tx) error {
tx.CreateBucket([]byte("widgets"))
b := tx.Bucket([]byte("widgets"))
b.Put([]byte("foo"), []byte("bar"))
b.Put([]byte("baz"), []byte("bat"))
b.Delete([]byte("foo"))
return nil
})
ok(t, err)
err = db.View(func(tx *bolt.Tx) error {
assert(t, tx.Bucket([]byte("widgets")).Get([]byte("foo")) == nil, "")
equals(t, []byte("bat"), tx.Bucket([]byte("widgets")).Get([]byte("baz")))
return nil
})
ok(t, err)
}
// Ensure a closed database returns an error while running a transaction block
func TestDB_Update_Closed(t *testing.T) {
var db bolt.DB
err := db.Update(func(tx *bolt.Tx) error {
tx.CreateBucket([]byte("widgets"))
return nil
})
equals(t, err, bolt.ErrDatabaseNotOpen)
}
// Ensure a panic occurs while trying to commit a managed transaction.
func TestDB_Update_ManualCommit(t *testing.T) {
db := NewTestDB()
defer db.Close()
var ok bool
db.Update(func(tx *bolt.Tx) error {
func() {
defer func() {
if r := recover(); r != nil {
ok = true
}
}()
tx.Commit()
}()
return nil
})
assert(t, ok, "expected panic")
}
// Ensure a panic occurs while trying to rollback a managed transaction.
func TestDB_Update_ManualRollback(t *testing.T) {
db := NewTestDB()
defer db.Close()
var ok bool
db.Update(func(tx *bolt.Tx) error {
func() {
defer func() {
if r := recover(); r != nil {
ok = true
}
}()
tx.Rollback()
}()
return nil
})
assert(t, ok, "expected panic")
}
// Ensure a panic occurs while trying to commit a managed transaction.
func TestDB_View_ManualCommit(t *testing.T) {
db := NewTestDB()
defer db.Close()
var ok bool
db.Update(func(tx *bolt.Tx) error {
func() {
defer func() {
if r := recover(); r != nil {
ok = true
}
}()
tx.Commit()
}()
return nil
})
assert(t, ok, "expected panic")
}
// Ensure a panic occurs while trying to rollback a managed transaction.
func TestDB_View_ManualRollback(t *testing.T) {
db := NewTestDB()
defer db.Close()
var ok bool
db.Update(func(tx *bolt.Tx) error {
func() {
defer func() {
if r := recover(); r != nil {
ok = true
}
}()
tx.Rollback()
}()
return nil
})
assert(t, ok, "expected panic")
}
// Ensure a write transaction that panics does not hold open locks.
func TestDB_Update_Panic(t *testing.T) {
db := NewTestDB()
defer db.Close()
func() {
defer func() {
if r := recover(); r != nil {
t.Log("recover: update", r)
}
}()
db.Update(func(tx *bolt.Tx) error {
tx.CreateBucket([]byte("widgets"))
panic("omg")
})
}()
// Verify we can update again.
err := db.Update(func(tx *bolt.Tx) error {
_, err := tx.CreateBucket([]byte("widgets"))
return err
})
ok(t, err)
// Verify that our change persisted.
err = db.Update(func(tx *bolt.Tx) error {
assert(t, tx.Bucket([]byte("widgets")) != nil, "")
return nil
})
}
// Ensure a database can return an error through a read-only transactional block.
func TestDB_View_Error(t *testing.T) {
db := NewTestDB()
defer db.Close()
err := db.View(func(tx *bolt.Tx) error {
return errors.New("xxx")
})
equals(t, errors.New("xxx"), err)
}
// Ensure a read transaction that panics does not hold open locks.
func TestDB_View_Panic(t *testing.T) {
db := NewTestDB()
defer db.Close()
db.Update(func(tx *bolt.Tx) error {
tx.CreateBucket([]byte("widgets"))
return nil
})
func() {
defer func() {
if r := recover(); r != nil {
t.Log("recover: view", r)
}
}()
db.View(func(tx *bolt.Tx) error {
assert(t, tx.Bucket([]byte("widgets")) != nil, "")
panic("omg")
})
}()
// Verify that we can still use read transactions.
db.View(func(tx *bolt.Tx) error {
assert(t, tx.Bucket([]byte("widgets")) != nil, "")
return nil
})
}
// Ensure that an error is returned when a database write fails.
func TestDB_Commit_WriteFail(t *testing.T) {
t.Skip("pending") // TODO(benbjohnson)
}
// Ensure that DB stats can be returned.
func TestDB_Stats(t *testing.T) {
db := NewTestDB()
defer db.Close()
db.Update(func(tx *bolt.Tx) error {
_, err := tx.CreateBucket([]byte("widgets"))
return err
})
stats := db.Stats()
equals(t, 2, stats.TxStats.PageCount)
equals(t, 0, stats.FreePageN)
equals(t, 2, stats.PendingPageN)
}
// Ensure that database pages are in expected order and type.
func TestDB_Consistency(t *testing.T) {
db := NewTestDB()
defer db.Close()
db.Update(func(tx *bolt.Tx) error {
_, err := tx.CreateBucket([]byte("widgets"))
return err
})
for i := 0; i < 10; i++ {
db.Update(func(tx *bolt.Tx) error {
ok(t, tx.Bucket([]byte("widgets")).Put([]byte("foo"), []byte("bar")))
return nil
})
}
db.Update(func(tx *bolt.Tx) error {
p, _ := tx.Page(0)
assert(t, p != nil, "")
equals(t, "meta", p.Type)
p, _ = tx.Page(1)
assert(t, p != nil, "")
equals(t, "meta", p.Type)
p, _ = tx.Page(2)
assert(t, p != nil, "")
equals(t, "free", p.Type)
p, _ = tx.Page(3)
assert(t, p != nil, "")
equals(t, "free", p.Type)
p, _ = tx.Page(4)
assert(t, p != nil, "")
equals(t, "leaf", p.Type)
p, _ = tx.Page(5)
assert(t, p != nil, "")
equals(t, "freelist", p.Type)
p, _ = tx.Page(6)
assert(t, p == nil, "")
return nil
})
}
// Ensure that DB stats can be substracted from one another.
func TestDBStats_Sub(t *testing.T) {
var a, b bolt.Stats
a.TxStats.PageCount = 3
a.FreePageN = 4
b.TxStats.PageCount = 10
b.FreePageN = 14
diff := b.Sub(&a)
equals(t, 7, diff.TxStats.PageCount)
// free page stats are copied from the receiver and not subtracted
equals(t, 14, diff.FreePageN)
}
func ExampleDB_Update() {
// Open the database.
db, _ := bolt.Open(tempfile(), 0666, nil)
defer os.Remove(db.Path())
defer db.Close()
// Execute several commands within a write transaction.
err := db.Update(func(tx *bolt.Tx) error {
b, err := tx.CreateBucket([]byte("widgets"))
if err != nil {
return err
}
if err := b.Put([]byte("foo"), []byte("bar")); err != nil {
return err
}
return nil
})
// If our transactional block didn't return an error then our data is saved.
if err == nil {
db.View(func(tx *bolt.Tx) error {
value := tx.Bucket([]byte("widgets")).Get([]byte("foo"))
fmt.Printf("The value of 'foo' is: %s\n", value)
return nil
})
}
// Output:
// The value of 'foo' is: bar
}
func ExampleDB_View() {
// Open the database.
db, _ := bolt.Open(tempfile(), 0666, nil)
defer os.Remove(db.Path())
defer db.Close()
// Insert data into a bucket.
db.Update(func(tx *bolt.Tx) error {
tx.CreateBucket([]byte("people"))
b := tx.Bucket([]byte("people"))
b.Put([]byte("john"), []byte("doe"))
b.Put([]byte("susy"), []byte("que"))
return nil
})
// Access data from within a read-only transactional block.
db.View(func(tx *bolt.Tx) error {
v := tx.Bucket([]byte("people")).Get([]byte("john"))
fmt.Printf("John's last name is %s.\n", v)
return nil
})
// Output:
// John's last name is doe.
}
func ExampleDB_Begin_ReadOnly() {
// Open the database.
db, _ := bolt.Open(tempfile(), 0666, nil)
defer os.Remove(db.Path())
defer db.Close()
// Create a bucket.
db.Update(func(tx *bolt.Tx) error {
_, err := tx.CreateBucket([]byte("widgets"))
return err
})
// Create several keys in a transaction.
tx, _ := db.Begin(true)
b := tx.Bucket([]byte("widgets"))
b.Put([]byte("john"), []byte("blue"))
b.Put([]byte("abby"), []byte("red"))
b.Put([]byte("zephyr"), []byte("purple"))
tx.Commit()
// Iterate over the values in sorted key order.
tx, _ = db.Begin(false)
c := tx.Bucket([]byte("widgets")).Cursor()
for k, v := c.First(); k != nil; k, v = c.Next() {
fmt.Printf("%s likes %s\n", k, v)
}
tx.Rollback()
// Output:
// abby likes red
// john likes blue
// zephyr likes purple
}
// TestDB represents a wrapper around a Bolt DB to handle temporary file
// creation and automatic cleanup on close.
type TestDB struct {
*bolt.DB
}
// NewTestDB returns a new instance of TestDB.
func NewTestDB() *TestDB {
db, err := bolt.Open(tempfile(), 0666, nil)
if err != nil {
panic("cannot open db: " + err.Error())
}
return &TestDB{db}
}
// MustView executes a read-only function. Panic on error.
func (db *TestDB) MustView(fn func(tx *bolt.Tx) error) {
if err := db.DB.View(func(tx *bolt.Tx) error {
return fn(tx)
}); err != nil {
panic(err.Error())
}
}
// MustUpdate executes a read-write function. Panic on error.
func (db *TestDB) MustUpdate(fn func(tx *bolt.Tx) error) {
if err := db.DB.View(func(tx *bolt.Tx) error {
return fn(tx)
}); err != nil {
panic(err.Error())
}
}
// MustCreateBucket creates a new bucket. Panic on error.
func (db *TestDB) MustCreateBucket(name []byte) {
if err := db.Update(func(tx *bolt.Tx) error {
_, err := tx.CreateBucket([]byte(name))
return err
}); err != nil {
panic(err.Error())
}
}
// Close closes the database and deletes the underlying file.
func (db *TestDB) Close() {
// Log statistics.
if *statsFlag {
db.PrintStats()
}
// Check database consistency after every test.
db.MustCheck()
// Close database and remove file.
defer os.Remove(db.Path())
db.DB.Close()
}
// PrintStats prints the database stats
func (db *TestDB) PrintStats() {
var stats = db.Stats()
fmt.Printf("[db] %-20s %-20s %-20s\n",
fmt.Sprintf("pg(%d/%d)", stats.TxStats.PageCount, stats.TxStats.PageAlloc),
fmt.Sprintf("cur(%d)", stats.TxStats.CursorCount),
fmt.Sprintf("node(%d/%d)", stats.TxStats.NodeCount, stats.TxStats.NodeDeref),
)
fmt.Printf(" %-20s %-20s %-20s\n",
fmt.Sprintf("rebal(%d/%v)", stats.TxStats.Rebalance, truncDuration(stats.TxStats.RebalanceTime)),
fmt.Sprintf("spill(%d/%v)", stats.TxStats.Spill, truncDuration(stats.TxStats.SpillTime)),
fmt.Sprintf("w(%d/%v)", stats.TxStats.Write, truncDuration(stats.TxStats.WriteTime)),
)
}
// MustCheck runs a consistency check on the database and panics if any errors are found.
func (db *TestDB) MustCheck() {
db.View(func(tx *bolt.Tx) error {
// Collect all the errors.
var errors []error
for err := range tx.Check() {
errors = append(errors, err)
if len(errors) > 10 {
break
}
}
// If errors occurred, copy the DB and print the errors.
if len(errors) > 0 {
var path = tempfile()
tx.CopyFile(path, 0600)
// Print errors.
fmt.Print("\n\n")
fmt.Printf("consistency check failed (%d errors)\n", len(errors))
for _, err := range errors {
fmt.Println(err)
}
fmt.Println("")
fmt.Println("db saved to:")
fmt.Println(path)
fmt.Print("\n\n")
os.Exit(-1)
}
return nil
})
}
// CopyTempFile copies a database to a temporary file.
func (db *TestDB) CopyTempFile() {
path := tempfile()
db.View(func(tx *bolt.Tx) error { return tx.CopyFile(path, 0600) })
fmt.Println("db copied to: ", path)
}
// tempfile returns a temporary file path.
func tempfile() string {
f, _ := ioutil.TempFile("", "bolt-")
f.Close()
os.Remove(f.Name())
return f.Name()
}
// mustContainKeys checks that a bucket contains a given set of keys.
func mustContainKeys(b *bolt.Bucket, m map[string]string) {
found := make(map[string]string)
b.ForEach(func(k, _ []byte) error {
found[string(k)] = ""
return nil
})
// Check for keys found in bucket that shouldn't be there.
var keys []string
for k, _ := range found {
if _, ok := m[string(k)]; !ok {
keys = append(keys, k)
}
}
if len(keys) > 0 {
sort.Strings(keys)
panic(fmt.Sprintf("keys found(%d): %s", len(keys), strings.Join(keys, ",")))
}
// Check for keys not found in bucket that should be there.
for k, _ := range m {
if _, ok := found[string(k)]; !ok {
keys = append(keys, k)
}
}
if len(keys) > 0 {
sort.Strings(keys)
panic(fmt.Sprintf("keys not found(%d): %s", len(keys), strings.Join(keys, ",")))
}
}
func trunc(b []byte, length int) []byte {
if length < len(b) {
return b[:length]
}
return b
}
func truncDuration(d time.Duration) string {
return regexp.MustCompile(`^(\d+)(\.\d+)`).ReplaceAllString(d.String(), "$1")
}
func fileSize(path string) int64 {
fi, err := os.Stat(path)
if err != nil {
return 0
}
return fi.Size()
}
func warn(v ...interface{}) { fmt.Fprintln(os.Stderr, v...) }
func warnf(msg string, v ...interface{}) { fmt.Fprintf(os.Stderr, msg+"\n", v...) }
// u64tob converts a uint64 into an 8-byte slice.
func u64tob(v uint64) []byte {
b := make([]byte, 8)
binary.BigEndian.PutUint64(b, v)
return b
}
// btou64 converts an 8-byte slice into an uint64.
func btou64(b []byte) uint64 { return binary.BigEndian.Uint64(b) }

View File

@ -36,6 +36,10 @@ var (
// ErrTxClosed is returned when committing or rolling back a transaction
// that has already been committed or rolled back.
ErrTxClosed = errors.New("tx closed")
// ErrDatabaseReadOnly is returned when a mutating transaction is started on a
// read-only database.
ErrDatabaseReadOnly = errors.New("database is in read-only mode")
)
// These errors can occur when putting or deleting a value or a bucket.

View File

@ -48,15 +48,14 @@ func (f *freelist) pending_count() int {
// all returns a list of all free ids and all pending ids in one sorted list.
func (f *freelist) all() []pgid {
ids := make([]pgid, len(f.ids))
copy(ids, f.ids)
m := make(pgids, 0)
for _, list := range f.pending {
ids = append(ids, list...)
m = append(m, list...)
}
sort.Sort(pgids(ids))
return ids
sort.Sort(m)
return pgids(f.ids).merge(m)
}
// allocate returns the starting page id of a contiguous list of pages of a given size.
@ -127,15 +126,17 @@ func (f *freelist) free(txid txid, p *page) {
// release moves all page ids for a transaction id (or older) to the freelist.
func (f *freelist) release(txid txid) {
m := make(pgids, 0)
for tid, ids := range f.pending {
if tid <= txid {
// Move transaction's pending pages to the available freelist.
// Don't remove from the cache since the page is still free.
f.ids = append(f.ids, ids...)
m = append(m, ids...)
delete(f.pending, tid)
}
}
sort.Sort(pgids(f.ids))
sort.Sort(m)
f.ids = pgids(f.ids).merge(m)
}
// rollback removes the pages from a given pending tx.

View File

@ -1,129 +0,0 @@
package bolt
import (
"reflect"
"testing"
"unsafe"
)
// Ensure that a page is added to a transaction's freelist.
func TestFreelist_free(t *testing.T) {
f := newFreelist()
f.free(100, &page{id: 12})
if !reflect.DeepEqual([]pgid{12}, f.pending[100]) {
t.Fatalf("exp=%v; got=%v", []pgid{12}, f.pending[100])
}
}
// Ensure that a page and its overflow is added to a transaction's freelist.
func TestFreelist_free_overflow(t *testing.T) {
f := newFreelist()
f.free(100, &page{id: 12, overflow: 3})
if exp := []pgid{12, 13, 14, 15}; !reflect.DeepEqual(exp, f.pending[100]) {
t.Fatalf("exp=%v; got=%v", exp, f.pending[100])
}
}
// Ensure that a transaction's free pages can be released.
func TestFreelist_release(t *testing.T) {
f := newFreelist()
f.free(100, &page{id: 12, overflow: 1})
f.free(100, &page{id: 9})
f.free(102, &page{id: 39})
f.release(100)
f.release(101)
if exp := []pgid{9, 12, 13}; !reflect.DeepEqual(exp, f.ids) {
t.Fatalf("exp=%v; got=%v", exp, f.ids)
}
f.release(102)
if exp := []pgid{9, 12, 13, 39}; !reflect.DeepEqual(exp, f.ids) {
t.Fatalf("exp=%v; got=%v", exp, f.ids)
}
}
// Ensure that a freelist can find contiguous blocks of pages.
func TestFreelist_allocate(t *testing.T) {
f := &freelist{ids: []pgid{3, 4, 5, 6, 7, 9, 12, 13, 18}}
if id := int(f.allocate(3)); id != 3 {
t.Fatalf("exp=3; got=%v", id)
}
if id := int(f.allocate(1)); id != 6 {
t.Fatalf("exp=6; got=%v", id)
}
if id := int(f.allocate(3)); id != 0 {
t.Fatalf("exp=0; got=%v", id)
}
if id := int(f.allocate(2)); id != 12 {
t.Fatalf("exp=12; got=%v", id)
}
if id := int(f.allocate(1)); id != 7 {
t.Fatalf("exp=7; got=%v", id)
}
if id := int(f.allocate(0)); id != 0 {
t.Fatalf("exp=0; got=%v", id)
}
if id := int(f.allocate(0)); id != 0 {
t.Fatalf("exp=0; got=%v", id)
}
if exp := []pgid{9, 18}; !reflect.DeepEqual(exp, f.ids) {
t.Fatalf("exp=%v; got=%v", exp, f.ids)
}
if id := int(f.allocate(1)); id != 9 {
t.Fatalf("exp=9; got=%v", id)
}
if id := int(f.allocate(1)); id != 18 {
t.Fatalf("exp=18; got=%v", id)
}
if id := int(f.allocate(1)); id != 0 {
t.Fatalf("exp=0; got=%v", id)
}
if exp := []pgid{}; !reflect.DeepEqual(exp, f.ids) {
t.Fatalf("exp=%v; got=%v", exp, f.ids)
}
}
// Ensure that a freelist can deserialize from a freelist page.
func TestFreelist_read(t *testing.T) {
// Create a page.
var buf [4096]byte
page := (*page)(unsafe.Pointer(&buf[0]))
page.flags = freelistPageFlag
page.count = 2
// Insert 2 page ids.
ids := (*[3]pgid)(unsafe.Pointer(&page.ptr))
ids[0] = 23
ids[1] = 50
// Deserialize page into a freelist.
f := newFreelist()
f.read(page)
// Ensure that there are two page ids in the freelist.
if exp := []pgid{23, 50}; !reflect.DeepEqual(exp, f.ids) {
t.Fatalf("exp=%v; got=%v", exp, f.ids)
}
}
// Ensure that a freelist can serialize into a freelist page.
func TestFreelist_write(t *testing.T) {
// Create a freelist and write it to a page.
var buf [4096]byte
f := &freelist{ids: []pgid{12, 39}, pending: make(map[txid][]pgid)}
f.pending[100] = []pgid{28, 11}
f.pending[101] = []pgid{3}
p := (*page)(unsafe.Pointer(&buf[0]))
f.write(p)
// Read the page back out.
f2 := newFreelist()
f2.read(p)
// Ensure that the freelist is correct.
// All pages should be present and in reverse order.
if exp := []pgid{3, 11, 12, 28, 39}; !reflect.DeepEqual(exp, f2.ids) {
t.Fatalf("exp=%v; got=%v", exp, f2.ids)
}
}

View File

@ -221,11 +221,20 @@ func (n *node) write(p *page) {
_assert(elem.pgid != p.id, "write: circular dependency occurred")
}
// If the length of key+value is larger than the max allocation size
// then we need to reallocate the byte array pointer.
//
// See: https://github.com/boltdb/bolt/pull/335
klen, vlen := len(item.key), len(item.value)
if len(b) < klen+vlen {
b = (*[maxAllocSize]byte)(unsafe.Pointer(&b[0]))[:]
}
// Write data for the element to the end of the page.
copy(b[0:], item.key)
b = b[len(item.key):]
b = b[klen:]
copy(b[0:], item.value)
b = b[len(item.value):]
b = b[vlen:]
}
// DEBUG ONLY: n.dump()

View File

@ -1,156 +0,0 @@
package bolt
import (
"testing"
"unsafe"
)
// Ensure that a node can insert a key/value.
func TestNode_put(t *testing.T) {
n := &node{inodes: make(inodes, 0), bucket: &Bucket{tx: &Tx{meta: &meta{pgid: 1}}}}
n.put([]byte("baz"), []byte("baz"), []byte("2"), 0, 0)
n.put([]byte("foo"), []byte("foo"), []byte("0"), 0, 0)
n.put([]byte("bar"), []byte("bar"), []byte("1"), 0, 0)
n.put([]byte("foo"), []byte("foo"), []byte("3"), 0, leafPageFlag)
if len(n.inodes) != 3 {
t.Fatalf("exp=3; got=%d", len(n.inodes))
}
if k, v := n.inodes[0].key, n.inodes[0].value; string(k) != "bar" || string(v) != "1" {
t.Fatalf("exp=<bar,1>; got=<%s,%s>", k, v)
}
if k, v := n.inodes[1].key, n.inodes[1].value; string(k) != "baz" || string(v) != "2" {
t.Fatalf("exp=<baz,2>; got=<%s,%s>", k, v)
}
if k, v := n.inodes[2].key, n.inodes[2].value; string(k) != "foo" || string(v) != "3" {
t.Fatalf("exp=<foo,3>; got=<%s,%s>", k, v)
}
if n.inodes[2].flags != uint32(leafPageFlag) {
t.Fatalf("not a leaf: %d", n.inodes[2].flags)
}
}
// Ensure that a node can deserialize from a leaf page.
func TestNode_read_LeafPage(t *testing.T) {
// Create a page.
var buf [4096]byte
page := (*page)(unsafe.Pointer(&buf[0]))
page.flags = leafPageFlag
page.count = 2
// Insert 2 elements at the beginning. sizeof(leafPageElement) == 16
nodes := (*[3]leafPageElement)(unsafe.Pointer(&page.ptr))
nodes[0] = leafPageElement{flags: 0, pos: 32, ksize: 3, vsize: 4} // pos = sizeof(leafPageElement) * 2
nodes[1] = leafPageElement{flags: 0, pos: 23, ksize: 10, vsize: 3} // pos = sizeof(leafPageElement) + 3 + 4
// Write data for the nodes at the end.
data := (*[4096]byte)(unsafe.Pointer(&nodes[2]))
copy(data[:], []byte("barfooz"))
copy(data[7:], []byte("helloworldbye"))
// Deserialize page into a leaf.
n := &node{}
n.read(page)
// Check that there are two inodes with correct data.
if !n.isLeaf {
t.Fatal("expected leaf")
}
if len(n.inodes) != 2 {
t.Fatalf("exp=2; got=%d", len(n.inodes))
}
if k, v := n.inodes[0].key, n.inodes[0].value; string(k) != "bar" || string(v) != "fooz" {
t.Fatalf("exp=<bar,fooz>; got=<%s,%s>", k, v)
}
if k, v := n.inodes[1].key, n.inodes[1].value; string(k) != "helloworld" || string(v) != "bye" {
t.Fatalf("exp=<helloworld,bye>; got=<%s,%s>", k, v)
}
}
// Ensure that a node can serialize into a leaf page.
func TestNode_write_LeafPage(t *testing.T) {
// Create a node.
n := &node{isLeaf: true, inodes: make(inodes, 0), bucket: &Bucket{tx: &Tx{db: &DB{}, meta: &meta{pgid: 1}}}}
n.put([]byte("susy"), []byte("susy"), []byte("que"), 0, 0)
n.put([]byte("ricki"), []byte("ricki"), []byte("lake"), 0, 0)
n.put([]byte("john"), []byte("john"), []byte("johnson"), 0, 0)
// Write it to a page.
var buf [4096]byte
p := (*page)(unsafe.Pointer(&buf[0]))
n.write(p)
// Read the page back in.
n2 := &node{}
n2.read(p)
// Check that the two pages are the same.
if len(n2.inodes) != 3 {
t.Fatalf("exp=3; got=%d", len(n2.inodes))
}
if k, v := n2.inodes[0].key, n2.inodes[0].value; string(k) != "john" || string(v) != "johnson" {
t.Fatalf("exp=<john,johnson>; got=<%s,%s>", k, v)
}
if k, v := n2.inodes[1].key, n2.inodes[1].value; string(k) != "ricki" || string(v) != "lake" {
t.Fatalf("exp=<ricki,lake>; got=<%s,%s>", k, v)
}
if k, v := n2.inodes[2].key, n2.inodes[2].value; string(k) != "susy" || string(v) != "que" {
t.Fatalf("exp=<susy,que>; got=<%s,%s>", k, v)
}
}
// Ensure that a node can split into appropriate subgroups.
func TestNode_split(t *testing.T) {
// Create a node.
n := &node{inodes: make(inodes, 0), bucket: &Bucket{tx: &Tx{db: &DB{}, meta: &meta{pgid: 1}}}}
n.put([]byte("00000001"), []byte("00000001"), []byte("0123456701234567"), 0, 0)
n.put([]byte("00000002"), []byte("00000002"), []byte("0123456701234567"), 0, 0)
n.put([]byte("00000003"), []byte("00000003"), []byte("0123456701234567"), 0, 0)
n.put([]byte("00000004"), []byte("00000004"), []byte("0123456701234567"), 0, 0)
n.put([]byte("00000005"), []byte("00000005"), []byte("0123456701234567"), 0, 0)
// Split between 2 & 3.
n.split(100)
var parent = n.parent
if len(parent.children) != 2 {
t.Fatalf("exp=2; got=%d", len(parent.children))
}
if len(parent.children[0].inodes) != 2 {
t.Fatalf("exp=2; got=%d", len(parent.children[0].inodes))
}
if len(parent.children[1].inodes) != 3 {
t.Fatalf("exp=3; got=%d", len(parent.children[1].inodes))
}
}
// Ensure that a page with the minimum number of inodes just returns a single node.
func TestNode_split_MinKeys(t *testing.T) {
// Create a node.
n := &node{inodes: make(inodes, 0), bucket: &Bucket{tx: &Tx{db: &DB{}, meta: &meta{pgid: 1}}}}
n.put([]byte("00000001"), []byte("00000001"), []byte("0123456701234567"), 0, 0)
n.put([]byte("00000002"), []byte("00000002"), []byte("0123456701234567"), 0, 0)
// Split.
n.split(20)
if n.parent != nil {
t.Fatalf("expected nil parent")
}
}
// Ensure that a node that has keys that all fit on a page just returns one leaf.
func TestNode_split_SinglePage(t *testing.T) {
// Create a node.
n := &node{inodes: make(inodes, 0), bucket: &Bucket{tx: &Tx{db: &DB{}, meta: &meta{pgid: 1}}}}
n.put([]byte("00000001"), []byte("00000001"), []byte("0123456701234567"), 0, 0)
n.put([]byte("00000002"), []byte("00000002"), []byte("0123456701234567"), 0, 0)
n.put([]byte("00000003"), []byte("00000003"), []byte("0123456701234567"), 0, 0)
n.put([]byte("00000004"), []byte("00000004"), []byte("0123456701234567"), 0, 0)
n.put([]byte("00000005"), []byte("00000005"), []byte("0123456701234567"), 0, 0)
// Split.
n.split(4096)
if n.parent != nil {
t.Fatalf("expected nil parent")
}
}

View File

@ -3,6 +3,7 @@ package bolt
import (
"fmt"
"os"
"sort"
"unsafe"
)
@ -96,7 +97,7 @@ type branchPageElement struct {
// key returns a byte slice of the node key.
func (n *branchPageElement) key() []byte {
buf := (*[maxAllocSize]byte)(unsafe.Pointer(n))
return buf[n.pos : n.pos+n.ksize]
return (*[maxAllocSize]byte)(unsafe.Pointer(&buf[n.pos]))[:n.ksize]
}
// leafPageElement represents a node on a leaf page.
@ -110,13 +111,13 @@ type leafPageElement struct {
// key returns a byte slice of the node key.
func (n *leafPageElement) key() []byte {
buf := (*[maxAllocSize]byte)(unsafe.Pointer(n))
return buf[n.pos : n.pos+n.ksize]
return (*[maxAllocSize]byte)(unsafe.Pointer(&buf[n.pos]))[:n.ksize]
}
// value returns a byte slice of the node value.
func (n *leafPageElement) value() []byte {
buf := (*[maxAllocSize]byte)(unsafe.Pointer(n))
return buf[n.pos+n.ksize : n.pos+n.ksize+n.vsize]
return (*[maxAllocSize]byte)(unsafe.Pointer(&buf[n.pos+n.ksize]))[:n.vsize]
}
// PageInfo represents human readable information about a page.
@ -132,3 +133,40 @@ type pgids []pgid
func (s pgids) Len() int { return len(s) }
func (s pgids) Swap(i, j int) { s[i], s[j] = s[j], s[i] }
func (s pgids) Less(i, j int) bool { return s[i] < s[j] }
// merge returns the sorted union of a and b.
func (a pgids) merge(b pgids) pgids {
// Return the opposite slice if one is nil.
if len(a) == 0 {
return b
} else if len(b) == 0 {
return a
}
// Create a list to hold all elements from both lists.
merged := make(pgids, 0, len(a)+len(b))
// Assign lead to the slice with a lower starting value, follow to the higher value.
lead, follow := a, b
if b[0] < a[0] {
lead, follow = b, a
}
// Continue while there are elements in the lead.
for len(lead) > 0 {
// Merge largest prefix of lead that is ahead of follow[0].
n := sort.Search(len(lead), func(i int) bool { return lead[i] > follow[0] })
merged = append(merged, lead[:n]...)
if n >= len(lead) {
break
}
// Swap lead and follow.
lead, follow = follow, lead[n:]
}
// Append what's left in follow.
merged = append(merged, follow...)
return merged
}

View File

@ -1,29 +0,0 @@
package bolt
import (
"testing"
)
// Ensure that the page type can be returned in human readable format.
func TestPage_typ(t *testing.T) {
if typ := (&page{flags: branchPageFlag}).typ(); typ != "branch" {
t.Fatalf("exp=branch; got=%v", typ)
}
if typ := (&page{flags: leafPageFlag}).typ(); typ != "leaf" {
t.Fatalf("exp=leaf; got=%v", typ)
}
if typ := (&page{flags: metaPageFlag}).typ(); typ != "meta" {
t.Fatalf("exp=meta; got=%v", typ)
}
if typ := (&page{flags: freelistPageFlag}).typ(); typ != "freelist" {
t.Fatalf("exp=freelist; got=%v", typ)
}
if typ := (&page{flags: 20000}).typ(); typ != "unknown<4e20>" {
t.Fatalf("exp=unknown<4e20>; got=%v", typ)
}
}
// Ensure that the hexdump debugging function doesn't blow up.
func TestPage_dump(t *testing.T) {
(&page{id: 256}).hexdump(16)
}

View File

@ -1,79 +0,0 @@
package bolt_test
import (
"bytes"
"flag"
"fmt"
"math/rand"
"os"
"reflect"
"testing/quick"
"time"
)
// testing/quick defaults to 5 iterations and a random seed.
// You can override these settings from the command line:
//
// -quick.count The number of iterations to perform.
// -quick.seed The seed to use for randomizing.
// -quick.maxitems The maximum number of items to insert into a DB.
// -quick.maxksize The maximum size of a key.
// -quick.maxvsize The maximum size of a value.
//
var qcount, qseed, qmaxitems, qmaxksize, qmaxvsize int
func init() {
flag.IntVar(&qcount, "quick.count", 5, "")
flag.IntVar(&qseed, "quick.seed", int(time.Now().UnixNano())%100000, "")
flag.IntVar(&qmaxitems, "quick.maxitems", 1000, "")
flag.IntVar(&qmaxksize, "quick.maxksize", 1024, "")
flag.IntVar(&qmaxvsize, "quick.maxvsize", 1024, "")
flag.Parse()
fmt.Fprintln(os.Stderr, "seed:", qseed)
fmt.Fprintf(os.Stderr, "quick settings: count=%v, items=%v, ksize=%v, vsize=%v\n", qcount, qmaxitems, qmaxksize, qmaxvsize)
}
func qconfig() *quick.Config {
return &quick.Config{
MaxCount: qcount,
Rand: rand.New(rand.NewSource(int64(qseed))),
}
}
type testdata []testdataitem
func (t testdata) Len() int { return len(t) }
func (t testdata) Swap(i, j int) { t[i], t[j] = t[j], t[i] }
func (t testdata) Less(i, j int) bool { return bytes.Compare(t[i].Key, t[j].Key) == -1 }
func (t testdata) Generate(rand *rand.Rand, size int) reflect.Value {
n := rand.Intn(qmaxitems-1) + 1
items := make(testdata, n)
for i := 0; i < n; i++ {
item := &items[i]
item.Key = randByteSlice(rand, 1, qmaxksize)
item.Value = randByteSlice(rand, 0, qmaxvsize)
}
return reflect.ValueOf(items)
}
type revtestdata []testdataitem
func (t revtestdata) Len() int { return len(t) }
func (t revtestdata) Swap(i, j int) { t[i], t[j] = t[j], t[i] }
func (t revtestdata) Less(i, j int) bool { return bytes.Compare(t[i].Key, t[j].Key) == 1 }
type testdataitem struct {
Key []byte
Value []byte
}
func randByteSlice(rand *rand.Rand, minSize, maxSize int) []byte {
n := rand.Intn(maxSize-minSize) + minSize
b := make([]byte, n)
for i := 0; i < n; i++ {
b[i] = byte(rand.Intn(255))
}
return b
}

View File

@ -1,327 +0,0 @@
package bolt_test
import (
"bytes"
"fmt"
"math/rand"
"sync"
"testing"
"github.com/coreos/etcd/Godeps/_workspace/src/github.com/boltdb/bolt"
)
func TestSimulate_1op_1p(t *testing.T) { testSimulate(t, 100, 1) }
func TestSimulate_10op_1p(t *testing.T) { testSimulate(t, 10, 1) }
func TestSimulate_100op_1p(t *testing.T) { testSimulate(t, 100, 1) }
func TestSimulate_1000op_1p(t *testing.T) { testSimulate(t, 1000, 1) }
func TestSimulate_10000op_1p(t *testing.T) { testSimulate(t, 10000, 1) }
func TestSimulate_10op_10p(t *testing.T) { testSimulate(t, 10, 10) }
func TestSimulate_100op_10p(t *testing.T) { testSimulate(t, 100, 10) }
func TestSimulate_1000op_10p(t *testing.T) { testSimulate(t, 1000, 10) }
func TestSimulate_10000op_10p(t *testing.T) { testSimulate(t, 10000, 10) }
func TestSimulate_100op_100p(t *testing.T) { testSimulate(t, 100, 100) }
func TestSimulate_1000op_100p(t *testing.T) { testSimulate(t, 1000, 100) }
func TestSimulate_10000op_100p(t *testing.T) { testSimulate(t, 10000, 100) }
func TestSimulate_10000op_1000p(t *testing.T) { testSimulate(t, 10000, 1000) }
// Randomly generate operations on a given database with multiple clients to ensure consistency and thread safety.
func testSimulate(t *testing.T, threadCount, parallelism int) {
if testing.Short() {
t.Skip("skipping test in short mode.")
}
rand.Seed(int64(qseed))
// A list of operations that readers and writers can perform.
var readerHandlers = []simulateHandler{simulateGetHandler}
var writerHandlers = []simulateHandler{simulateGetHandler, simulatePutHandler}
var versions = make(map[int]*QuickDB)
versions[1] = NewQuickDB()
db := NewTestDB()
defer db.Close()
var mutex sync.Mutex
// Run n threads in parallel, each with their own operation.
var wg sync.WaitGroup
var threads = make(chan bool, parallelism)
var i int
for {
threads <- true
wg.Add(1)
writable := ((rand.Int() % 100) < 20) // 20% writers
// Choose an operation to execute.
var handler simulateHandler
if writable {
handler = writerHandlers[rand.Intn(len(writerHandlers))]
} else {
handler = readerHandlers[rand.Intn(len(readerHandlers))]
}
// Execute a thread for the given operation.
go func(writable bool, handler simulateHandler) {
defer wg.Done()
// Start transaction.
tx, err := db.Begin(writable)
if err != nil {
t.Fatal("tx begin: ", err)
}
// Obtain current state of the dataset.
mutex.Lock()
var qdb = versions[tx.ID()]
if writable {
qdb = versions[tx.ID()-1].Copy()
}
mutex.Unlock()
// Make sure we commit/rollback the tx at the end and update the state.
if writable {
defer func() {
mutex.Lock()
versions[tx.ID()] = qdb
mutex.Unlock()
ok(t, tx.Commit())
}()
} else {
defer tx.Rollback()
}
// Ignore operation if we don't have data yet.
if qdb == nil {
return
}
// Execute handler.
handler(tx, qdb)
// Release a thread back to the scheduling loop.
<-threads
}(writable, handler)
i++
if i > threadCount {
break
}
}
// Wait until all threads are done.
wg.Wait()
}
type simulateHandler func(tx *bolt.Tx, qdb *QuickDB)
// Retrieves a key from the database and verifies that it is what is expected.
func simulateGetHandler(tx *bolt.Tx, qdb *QuickDB) {
// Randomly retrieve an existing exist.
keys := qdb.Rand()
if len(keys) == 0 {
return
}
// Retrieve root bucket.
b := tx.Bucket(keys[0])
if b == nil {
panic(fmt.Sprintf("bucket[0] expected: %08x\n", trunc(keys[0], 4)))
}
// Drill into nested buckets.
for _, key := range keys[1 : len(keys)-1] {
b = b.Bucket(key)
if b == nil {
panic(fmt.Sprintf("bucket[n] expected: %v -> %v\n", keys, key))
}
}
// Verify key/value on the final bucket.
expected := qdb.Get(keys)
actual := b.Get(keys[len(keys)-1])
if !bytes.Equal(actual, expected) {
fmt.Println("=== EXPECTED ===")
fmt.Println(expected)
fmt.Println("=== ACTUAL ===")
fmt.Println(actual)
fmt.Println("=== END ===")
panic("value mismatch")
}
}
// Inserts a key into the database.
func simulatePutHandler(tx *bolt.Tx, qdb *QuickDB) {
var err error
keys, value := randKeys(), randValue()
// Retrieve root bucket.
b := tx.Bucket(keys[0])
if b == nil {
b, err = tx.CreateBucket(keys[0])
if err != nil {
panic("create bucket: " + err.Error())
}
}
// Create nested buckets, if necessary.
for _, key := range keys[1 : len(keys)-1] {
child := b.Bucket(key)
if child != nil {
b = child
} else {
b, err = b.CreateBucket(key)
if err != nil {
panic("create bucket: " + err.Error())
}
}
}
// Insert into database.
if err := b.Put(keys[len(keys)-1], value); err != nil {
panic("put: " + err.Error())
}
// Insert into in-memory database.
qdb.Put(keys, value)
}
// QuickDB is an in-memory database that replicates the functionality of the
// Bolt DB type except that it is entirely in-memory. It is meant for testing
// that the Bolt database is consistent.
type QuickDB struct {
sync.RWMutex
m map[string]interface{}
}
// NewQuickDB returns an instance of QuickDB.
func NewQuickDB() *QuickDB {
return &QuickDB{m: make(map[string]interface{})}
}
// Get retrieves the value at a key path.
func (db *QuickDB) Get(keys [][]byte) []byte {
db.RLock()
defer db.RUnlock()
m := db.m
for _, key := range keys[:len(keys)-1] {
value := m[string(key)]
if value == nil {
return nil
}
switch value := value.(type) {
case map[string]interface{}:
m = value
case []byte:
return nil
}
}
// Only return if it's a simple value.
if value, ok := m[string(keys[len(keys)-1])].([]byte); ok {
return value
}
return nil
}
// Put inserts a value into a key path.
func (db *QuickDB) Put(keys [][]byte, value []byte) {
db.Lock()
defer db.Unlock()
// Build buckets all the way down the key path.
m := db.m
for _, key := range keys[:len(keys)-1] {
if _, ok := m[string(key)].([]byte); ok {
return // Keypath intersects with a simple value. Do nothing.
}
if m[string(key)] == nil {
m[string(key)] = make(map[string]interface{})
}
m = m[string(key)].(map[string]interface{})
}
// Insert value into the last key.
m[string(keys[len(keys)-1])] = value
}
// Rand returns a random key path that points to a simple value.
func (db *QuickDB) Rand() [][]byte {
db.RLock()
defer db.RUnlock()
if len(db.m) == 0 {
return nil
}
var keys [][]byte
db.rand(db.m, &keys)
return keys
}
func (db *QuickDB) rand(m map[string]interface{}, keys *[][]byte) {
i, index := 0, rand.Intn(len(m))
for k, v := range m {
if i == index {
*keys = append(*keys, []byte(k))
if v, ok := v.(map[string]interface{}); ok {
db.rand(v, keys)
}
return
}
i++
}
panic("quickdb rand: out-of-range")
}
// Copy copies the entire database.
func (db *QuickDB) Copy() *QuickDB {
db.RLock()
defer db.RUnlock()
return &QuickDB{m: db.copy(db.m)}
}
func (db *QuickDB) copy(m map[string]interface{}) map[string]interface{} {
clone := make(map[string]interface{}, len(m))
for k, v := range m {
switch v := v.(type) {
case map[string]interface{}:
clone[k] = db.copy(v)
default:
clone[k] = v
}
}
return clone
}
func randKey() []byte {
var min, max = 1, 1024
n := rand.Intn(max-min) + min
b := make([]byte, n)
for i := 0; i < n; i++ {
b[i] = byte(rand.Intn(255))
}
return b
}
func randKeys() [][]byte {
var keys [][]byte
var count = rand.Intn(2) + 2
for i := 0; i < count; i++ {
keys = append(keys, randKey())
}
return keys
}
func randValue() []byte {
n := rand.Intn(8192)
b := make([]byte, n)
for i := 0; i < n; i++ {
b[i] = byte(rand.Intn(255))
}
return b
}

View File

@ -29,6 +29,14 @@ type Tx struct {
pages map[pgid]*page
stats TxStats
commitHandlers []func()
// WriteFlag specifies the flag for write-related methods like WriteTo().
// Tx opens the database file with the specified flag to copy the data.
//
// By default, the flag is unset, which works well for mostly in-memory
// workloads. For databases that are much larger than available RAM,
// set the flag to syscall.O_DIRECT to avoid trashing the page cache.
WriteFlag int
}
// init initializes the transaction.
@ -87,18 +95,21 @@ func (tx *Tx) Stats() TxStats {
// Bucket retrieves a bucket by name.
// Returns nil if the bucket does not exist.
// The bucket instance is only valid for the lifetime of the transaction.
func (tx *Tx) Bucket(name []byte) *Bucket {
return tx.root.Bucket(name)
}
// CreateBucket creates a new bucket.
// Returns an error if the bucket already exists, if the bucket name is blank, or if the bucket name is too long.
// The bucket instance is only valid for the lifetime of the transaction.
func (tx *Tx) CreateBucket(name []byte) (*Bucket, error) {
return tx.root.CreateBucket(name)
}
// CreateBucketIfNotExists creates a new bucket if it doesn't already exist.
// Returns an error if the bucket name is blank, or if the bucket name is too long.
// The bucket instance is only valid for the lifetime of the transaction.
func (tx *Tx) CreateBucketIfNotExists(name []byte) (*Bucket, error) {
return tx.root.CreateBucketIfNotExists(name)
}
@ -127,7 +138,8 @@ func (tx *Tx) OnCommit(fn func()) {
}
// Commit writes all changes to disk and updates the meta page.
// Returns an error if a disk write error occurs.
// Returns an error if a disk write error occurs, or if Commit is
// called on a read-only transaction.
func (tx *Tx) Commit() error {
_assert(!tx.managed, "managed tx commit not allowed")
if tx.db == nil {
@ -156,6 +168,8 @@ func (tx *Tx) Commit() error {
// Free the old root bucket.
tx.meta.root.root = tx.root.root
opgid := tx.meta.pgid
// Free the freelist and allocate new pages for it. This will overestimate
// the size of the freelist but not underestimate the size (which would be bad).
tx.db.freelist.free(tx.meta.txid, tx.db.page(tx.meta.freelist))
@ -170,6 +184,14 @@ func (tx *Tx) Commit() error {
}
tx.meta.freelist = p.id
// If the high water mark has moved up then attempt to grow the database.
if tx.meta.pgid > opgid {
if err := tx.db.grow(int(tx.meta.pgid+1) * tx.db.pageSize); err != nil {
tx.rollback()
return err
}
}
// Write dirty pages to disk.
startTime = time.Now()
if err := tx.write(); err != nil {
@ -203,7 +225,8 @@ func (tx *Tx) Commit() error {
return nil
}
// Rollback closes the transaction and ignores all previous updates.
// Rollback closes the transaction and ignores all previous updates. Read-only
// transactions must be rolled back and not committed.
func (tx *Tx) Rollback() error {
_assert(!tx.managed, "managed tx rollback not allowed")
if tx.db == nil {
@ -234,7 +257,8 @@ func (tx *Tx) close() {
var freelistPendingN = tx.db.freelist.pending_count()
var freelistAlloc = tx.db.freelist.size()
// Remove writer lock.
// Remove transaction ref & writer lock.
tx.db.rwtx = nil
tx.db.rwlock.Unlock()
// Merge statistics.
@ -248,11 +272,16 @@ func (tx *Tx) close() {
} else {
tx.db.removeTx(tx)
}
// Clear all references.
tx.db = nil
tx.meta = nil
tx.root = Bucket{tx: tx}
tx.pages = nil
}
// Copy writes the entire database to a writer.
// This function exists for backwards compatibility. Use WriteTo() in
// This function exists for backwards compatibility. Use WriteTo() instead.
func (tx *Tx) Copy(w io.Writer) error {
_, err := tx.WriteTo(w)
return err
@ -261,29 +290,47 @@ func (tx *Tx) Copy(w io.Writer) error {
// WriteTo writes the entire database to a writer.
// If err == nil then exactly tx.Size() bytes will be written into the writer.
func (tx *Tx) WriteTo(w io.Writer) (n int64, err error) {
// Attempt to open reader directly.
var f *os.File
if f, err = os.OpenFile(tx.db.path, os.O_RDONLY|odirect, 0); err != nil {
// Fallback to a regular open if that doesn't work.
if f, err = os.OpenFile(tx.db.path, os.O_RDONLY, 0); err != nil {
return 0, err
}
// Attempt to open reader with WriteFlag
f, err := os.OpenFile(tx.db.path, os.O_RDONLY|tx.WriteFlag, 0)
if err != nil {
return 0, err
}
defer func() { _ = f.Close() }()
// Generate a meta page. We use the same page data for both meta pages.
buf := make([]byte, tx.db.pageSize)
page := (*page)(unsafe.Pointer(&buf[0]))
page.flags = metaPageFlag
*page.meta() = *tx.meta
// Write meta 0.
page.id = 0
page.meta().checksum = page.meta().sum64()
nn, err := w.Write(buf)
n += int64(nn)
if err != nil {
return n, fmt.Errorf("meta 0 copy: %s", err)
}
// Copy the meta pages.
tx.db.metalock.Lock()
n, err = io.CopyN(w, f, int64(tx.db.pageSize*2))
tx.db.metalock.Unlock()
// Write meta 1 with a lower transaction id.
page.id = 1
page.meta().txid -= 1
page.meta().checksum = page.meta().sum64()
nn, err = w.Write(buf)
n += int64(nn)
if err != nil {
_ = f.Close()
return n, fmt.Errorf("meta copy: %s", err)
return n, fmt.Errorf("meta 1 copy: %s", err)
}
// Move past the meta pages in the file.
if _, err := f.Seek(int64(tx.db.pageSize*2), os.SEEK_SET); err != nil {
return n, fmt.Errorf("seek: %s", err)
}
// Copy data pages.
wn, err := io.CopyN(w, f, tx.Size()-int64(tx.db.pageSize*2))
n += wn
if err != nil {
_ = f.Close()
return n, err
}
@ -421,15 +468,39 @@ func (tx *Tx) write() error {
// Write pages to disk in order.
for _, p := range pages {
size := (int(p.overflow) + 1) * tx.db.pageSize
buf := (*[maxAllocSize]byte)(unsafe.Pointer(p))[:size]
offset := int64(p.id) * int64(tx.db.pageSize)
if _, err := tx.db.ops.writeAt(buf, offset); err != nil {
return err
}
// Update statistics.
tx.stats.Write++
// Write out page in "max allocation" sized chunks.
ptr := (*[maxAllocSize]byte)(unsafe.Pointer(p))
for {
// Limit our write to our max allocation size.
sz := size
if sz > maxAllocSize-1 {
sz = maxAllocSize - 1
}
// Write chunk to disk.
buf := ptr[:sz]
if _, err := tx.db.ops.writeAt(buf, offset); err != nil {
return err
}
// Update statistics.
tx.stats.Write++
// Exit inner for loop if we've written all the chunks.
size -= sz
if size == 0 {
break
}
// Otherwise move offset forward and move pointer to next chunk.
offset += int64(sz)
ptr = (*[maxAllocSize]byte)(unsafe.Pointer(&ptr[sz]))
}
}
// Ignore file sync if flag is set on DB.
if !tx.db.NoSync || IgnoreNoSync {
if err := fdatasync(tx.db); err != nil {
return err
@ -466,7 +537,7 @@ func (tx *Tx) writeMeta() error {
}
// page returns a reference to the page with a given id.
// If page has been written to then a temporary bufferred page is returned.
// If page has been written to then a temporary buffered page is returned.
func (tx *Tx) page(id pgid) *page {
// Check the dirty pages first.
if tx.pages != nil {

View File

@ -1,424 +0,0 @@
package bolt_test
import (
"errors"
"fmt"
"os"
"testing"
"github.com/coreos/etcd/Godeps/_workspace/src/github.com/boltdb/bolt"
)
// Ensure that committing a closed transaction returns an error.
func TestTx_Commit_Closed(t *testing.T) {
db := NewTestDB()
defer db.Close()
tx, _ := db.Begin(true)
tx.CreateBucket([]byte("foo"))
ok(t, tx.Commit())
equals(t, tx.Commit(), bolt.ErrTxClosed)
}
// Ensure that rolling back a closed transaction returns an error.
func TestTx_Rollback_Closed(t *testing.T) {
db := NewTestDB()
defer db.Close()
tx, _ := db.Begin(true)
ok(t, tx.Rollback())
equals(t, tx.Rollback(), bolt.ErrTxClosed)
}
// Ensure that committing a read-only transaction returns an error.
func TestTx_Commit_ReadOnly(t *testing.T) {
db := NewTestDB()
defer db.Close()
tx, _ := db.Begin(false)
equals(t, tx.Commit(), bolt.ErrTxNotWritable)
}
// Ensure that a transaction can retrieve a cursor on the root bucket.
func TestTx_Cursor(t *testing.T) {
db := NewTestDB()
defer db.Close()
db.Update(func(tx *bolt.Tx) error {
tx.CreateBucket([]byte("widgets"))
tx.CreateBucket([]byte("woojits"))
c := tx.Cursor()
k, v := c.First()
equals(t, "widgets", string(k))
assert(t, v == nil, "")
k, v = c.Next()
equals(t, "woojits", string(k))
assert(t, v == nil, "")
k, v = c.Next()
assert(t, k == nil, "")
assert(t, v == nil, "")
return nil
})
}
// Ensure that creating a bucket with a read-only transaction returns an error.
func TestTx_CreateBucket_ReadOnly(t *testing.T) {
db := NewTestDB()
defer db.Close()
db.View(func(tx *bolt.Tx) error {
b, err := tx.CreateBucket([]byte("foo"))
assert(t, b == nil, "")
equals(t, bolt.ErrTxNotWritable, err)
return nil
})
}
// Ensure that creating a bucket on a closed transaction returns an error.
func TestTx_CreateBucket_Closed(t *testing.T) {
db := NewTestDB()
defer db.Close()
tx, _ := db.Begin(true)
tx.Commit()
b, err := tx.CreateBucket([]byte("foo"))
assert(t, b == nil, "")
equals(t, bolt.ErrTxClosed, err)
}
// Ensure that a Tx can retrieve a bucket.
func TestTx_Bucket(t *testing.T) {
db := NewTestDB()
defer db.Close()
db.Update(func(tx *bolt.Tx) error {
tx.CreateBucket([]byte("widgets"))
b := tx.Bucket([]byte("widgets"))
assert(t, b != nil, "")
return nil
})
}
// Ensure that a Tx retrieving a non-existent key returns nil.
func TestTx_Get_Missing(t *testing.T) {
db := NewTestDB()
defer db.Close()
db.Update(func(tx *bolt.Tx) error {
tx.CreateBucket([]byte("widgets"))
tx.Bucket([]byte("widgets")).Put([]byte("foo"), []byte("bar"))
value := tx.Bucket([]byte("widgets")).Get([]byte("no_such_key"))
assert(t, value == nil, "")
return nil
})
}
// Ensure that a bucket can be created and retrieved.
func TestTx_CreateBucket(t *testing.T) {
db := NewTestDB()
defer db.Close()
// Create a bucket.
db.Update(func(tx *bolt.Tx) error {
b, err := tx.CreateBucket([]byte("widgets"))
assert(t, b != nil, "")
ok(t, err)
return nil
})
// Read the bucket through a separate transaction.
db.View(func(tx *bolt.Tx) error {
b := tx.Bucket([]byte("widgets"))
assert(t, b != nil, "")
return nil
})
}
// Ensure that a bucket can be created if it doesn't already exist.
func TestTx_CreateBucketIfNotExists(t *testing.T) {
db := NewTestDB()
defer db.Close()
db.Update(func(tx *bolt.Tx) error {
b, err := tx.CreateBucketIfNotExists([]byte("widgets"))
assert(t, b != nil, "")
ok(t, err)
b, err = tx.CreateBucketIfNotExists([]byte("widgets"))
assert(t, b != nil, "")
ok(t, err)
b, err = tx.CreateBucketIfNotExists([]byte{})
assert(t, b == nil, "")
equals(t, bolt.ErrBucketNameRequired, err)
b, err = tx.CreateBucketIfNotExists(nil)
assert(t, b == nil, "")
equals(t, bolt.ErrBucketNameRequired, err)
return nil
})
// Read the bucket through a separate transaction.
db.View(func(tx *bolt.Tx) error {
b := tx.Bucket([]byte("widgets"))
assert(t, b != nil, "")
return nil
})
}
// Ensure that a bucket cannot be created twice.
func TestTx_CreateBucket_Exists(t *testing.T) {
db := NewTestDB()
defer db.Close()
// Create a bucket.
db.Update(func(tx *bolt.Tx) error {
b, err := tx.CreateBucket([]byte("widgets"))
assert(t, b != nil, "")
ok(t, err)
return nil
})
// Create the same bucket again.
db.Update(func(tx *bolt.Tx) error {
b, err := tx.CreateBucket([]byte("widgets"))
assert(t, b == nil, "")
equals(t, bolt.ErrBucketExists, err)
return nil
})
}
// Ensure that a bucket is created with a non-blank name.
func TestTx_CreateBucket_NameRequired(t *testing.T) {
db := NewTestDB()
defer db.Close()
db.Update(func(tx *bolt.Tx) error {
b, err := tx.CreateBucket(nil)
assert(t, b == nil, "")
equals(t, bolt.ErrBucketNameRequired, err)
return nil
})
}
// Ensure that a bucket can be deleted.
func TestTx_DeleteBucket(t *testing.T) {
db := NewTestDB()
defer db.Close()
// Create a bucket and add a value.
db.Update(func(tx *bolt.Tx) error {
tx.CreateBucket([]byte("widgets"))
tx.Bucket([]byte("widgets")).Put([]byte("foo"), []byte("bar"))
return nil
})
// Delete the bucket and make sure we can't get the value.
db.Update(func(tx *bolt.Tx) error {
ok(t, tx.DeleteBucket([]byte("widgets")))
assert(t, tx.Bucket([]byte("widgets")) == nil, "")
return nil
})
db.Update(func(tx *bolt.Tx) error {
// Create the bucket again and make sure there's not a phantom value.
b, err := tx.CreateBucket([]byte("widgets"))
assert(t, b != nil, "")
ok(t, err)
assert(t, tx.Bucket([]byte("widgets")).Get([]byte("foo")) == nil, "")
return nil
})
}
// Ensure that deleting a bucket on a closed transaction returns an error.
func TestTx_DeleteBucket_Closed(t *testing.T) {
db := NewTestDB()
defer db.Close()
tx, _ := db.Begin(true)
tx.Commit()
equals(t, tx.DeleteBucket([]byte("foo")), bolt.ErrTxClosed)
}
// Ensure that deleting a bucket with a read-only transaction returns an error.
func TestTx_DeleteBucket_ReadOnly(t *testing.T) {
db := NewTestDB()
defer db.Close()
db.View(func(tx *bolt.Tx) error {
equals(t, tx.DeleteBucket([]byte("foo")), bolt.ErrTxNotWritable)
return nil
})
}
// Ensure that nothing happens when deleting a bucket that doesn't exist.
func TestTx_DeleteBucket_NotFound(t *testing.T) {
db := NewTestDB()
defer db.Close()
db.Update(func(tx *bolt.Tx) error {
equals(t, bolt.ErrBucketNotFound, tx.DeleteBucket([]byte("widgets")))
return nil
})
}
// Ensure that Tx commit handlers are called after a transaction successfully commits.
func TestTx_OnCommit(t *testing.T) {
var x int
db := NewTestDB()
defer db.Close()
db.Update(func(tx *bolt.Tx) error {
tx.OnCommit(func() { x += 1 })
tx.OnCommit(func() { x += 2 })
_, err := tx.CreateBucket([]byte("widgets"))
return err
})
equals(t, 3, x)
}
// Ensure that Tx commit handlers are NOT called after a transaction rolls back.
func TestTx_OnCommit_Rollback(t *testing.T) {
var x int
db := NewTestDB()
defer db.Close()
db.Update(func(tx *bolt.Tx) error {
tx.OnCommit(func() { x += 1 })
tx.OnCommit(func() { x += 2 })
tx.CreateBucket([]byte("widgets"))
return errors.New("rollback this commit")
})
equals(t, 0, x)
}
// Ensure that the database can be copied to a file path.
func TestTx_CopyFile(t *testing.T) {
db := NewTestDB()
defer db.Close()
var dest = tempfile()
db.Update(func(tx *bolt.Tx) error {
tx.CreateBucket([]byte("widgets"))
tx.Bucket([]byte("widgets")).Put([]byte("foo"), []byte("bar"))
tx.Bucket([]byte("widgets")).Put([]byte("baz"), []byte("bat"))
return nil
})
ok(t, db.View(func(tx *bolt.Tx) error { return tx.CopyFile(dest, 0600) }))
db2, err := bolt.Open(dest, 0600, nil)
ok(t, err)
defer db2.Close()
db2.View(func(tx *bolt.Tx) error {
equals(t, []byte("bar"), tx.Bucket([]byte("widgets")).Get([]byte("foo")))
equals(t, []byte("bat"), tx.Bucket([]byte("widgets")).Get([]byte("baz")))
return nil
})
}
type failWriterError struct{}
func (failWriterError) Error() string {
return "error injected for tests"
}
type failWriter struct {
// fail after this many bytes
After int
}
func (f *failWriter) Write(p []byte) (n int, err error) {
n = len(p)
if n > f.After {
n = f.After
err = failWriterError{}
}
f.After -= n
return n, err
}
// Ensure that Copy handles write errors right.
func TestTx_CopyFile_Error_Meta(t *testing.T) {
db := NewTestDB()
defer db.Close()
db.Update(func(tx *bolt.Tx) error {
tx.CreateBucket([]byte("widgets"))
tx.Bucket([]byte("widgets")).Put([]byte("foo"), []byte("bar"))
tx.Bucket([]byte("widgets")).Put([]byte("baz"), []byte("bat"))
return nil
})
err := db.View(func(tx *bolt.Tx) error { return tx.Copy(&failWriter{}) })
equals(t, err.Error(), "meta copy: error injected for tests")
}
// Ensure that Copy handles write errors right.
func TestTx_CopyFile_Error_Normal(t *testing.T) {
db := NewTestDB()
defer db.Close()
db.Update(func(tx *bolt.Tx) error {
tx.CreateBucket([]byte("widgets"))
tx.Bucket([]byte("widgets")).Put([]byte("foo"), []byte("bar"))
tx.Bucket([]byte("widgets")).Put([]byte("baz"), []byte("bat"))
return nil
})
err := db.View(func(tx *bolt.Tx) error { return tx.Copy(&failWriter{3 * db.Info().PageSize}) })
equals(t, err.Error(), "error injected for tests")
}
func ExampleTx_Rollback() {
// Open the database.
db, _ := bolt.Open(tempfile(), 0666, nil)
defer os.Remove(db.Path())
defer db.Close()
// Create a bucket.
db.Update(func(tx *bolt.Tx) error {
_, err := tx.CreateBucket([]byte("widgets"))
return err
})
// Set a value for a key.
db.Update(func(tx *bolt.Tx) error {
return tx.Bucket([]byte("widgets")).Put([]byte("foo"), []byte("bar"))
})
// Update the key but rollback the transaction so it never saves.
tx, _ := db.Begin(true)
b := tx.Bucket([]byte("widgets"))
b.Put([]byte("foo"), []byte("baz"))
tx.Rollback()
// Ensure that our original value is still set.
db.View(func(tx *bolt.Tx) error {
value := tx.Bucket([]byte("widgets")).Get([]byte("foo"))
fmt.Printf("The value for 'foo' is still: %s\n", value)
return nil
})
// Output:
// The value for 'foo' is still: bar
}
func ExampleTx_CopyFile() {
// Open the database.
db, _ := bolt.Open(tempfile(), 0666, nil)
defer os.Remove(db.Path())
defer db.Close()
// Create a bucket and a key.
db.Update(func(tx *bolt.Tx) error {
tx.CreateBucket([]byte("widgets"))
tx.Bucket([]byte("widgets")).Put([]byte("foo"), []byte("bar"))
return nil
})
// Copy the database to another file.
toFile := tempfile()
db.View(func(tx *bolt.Tx) error { return tx.CopyFile(toFile, 0666) })
defer os.Remove(toFile)
// Open the cloned database.
db2, _ := bolt.Open(toFile, 0666, nil)
defer db2.Close()
// Ensure that the key exists in the copy.
db2.View(func(tx *bolt.Tx) error {
value := tx.Bucket([]byte("widgets")).Get([]byte("foo"))
fmt.Printf("The value for 'foo' in the clone is: %s\n", value)
return nil
})
// Output:
// The value for 'foo' in the clone is: bar
}

View File

@ -1 +0,0 @@
*~

View File

@ -1,19 +0,0 @@
# This file is like Go's AUTHORS file: it lists Copyright holders.
# The list of humans who have contributd is in the CONTRIBUTORS file.
#
# To contribute to this project, because it will eventually be folded
# back in to Go itself, you need to submit a CLA:
#
# http://golang.org/doc/contribute.html#copyright
#
# Then you get added to CONTRIBUTORS and you or your company get added
# to the AUTHORS file.
Blake Mizerany <blake.mizerany@gmail.com> github=bmizerany
Daniel Morsing <daniel.morsing@gmail.com> github=DanielMorsing
Gabriel Aszalos <gabriel.aszalos@gmail.com> github=gbbr
Google, Inc.
Keith Rarick <kr@xph.us> github=kr
Matthew Keenan <tank.en.mate@gmail.com> <github@mattkeenan.net> github=mattkeenan
Matt Layher <mdlayher@gmail.com> github=mdlayher
Tatsuhiro Tsujikawa <tatsuhiro.t@gmail.com> github=tatsuhiro-t

View File

@ -1,19 +0,0 @@
# This file is like Go's CONTRIBUTORS file: it lists humans.
# The list of copyright holders (which may be companies) are in the AUTHORS file.
#
# To contribute to this project, because it will eventually be folded
# back in to Go itself, you need to submit a CLA:
#
# http://golang.org/doc/contribute.html#copyright
#
# Then you get added to CONTRIBUTORS and you or your company get added
# to the AUTHORS file.
Blake Mizerany <blake.mizerany@gmail.com> github=bmizerany
Brad Fitzpatrick <bradfitz@golang.org> github=bradfitz
Daniel Morsing <daniel.morsing@gmail.com> github=DanielMorsing
Gabriel Aszalos <gabriel.aszalos@gmail.com> github=gbbr
Keith Rarick <kr@xph.us> github=kr
Matthew Keenan <tank.en.mate@gmail.com> <github@mattkeenan.net> github=mattkeenan
Matt Layher <mdlayher@gmail.com> github=mdlayher
Tatsuhiro Tsujikawa <tatsuhiro.t@gmail.com> github=tatsuhiro-t

View File

@ -1,5 +0,0 @@
We only accept contributions from users who have gone through Go's
contribution process (signed a CLA).
Please acknowledge whether you have (and use the same email) if
sending a pull request.

Some files were not shown because too many files have changed in this diff Show More