Compare commits

...

140 Commits

Author SHA1 Message Date
918698add7 version: bump up to 3.1.12 2018-03-08 13:01:30 -08:00
00d14cfd03 test: fix etcd-tester flags
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-03-08 08:24:04 -08:00
6e11a79fd8 *: remove unused env vars
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-03-08 01:36:54 -08:00
028f99b103 test: fix "functional-tester" build script
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-03-08 01:29:57 -08:00
290fa0c1be *: sync "functional-tester" from 3.2 branch
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-03-08 00:57:30 -08:00
0874fcbed4 test: run "functional" tests in 3.1 branch
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-03-08 00:26:52 -08:00
7a148fee36 Merge pull request #9401 from jpbetz/automated-cherry-pick-of-#9297-origin-release-3.1
Automated cherry pick of #9297
2018-03-07 22:03:14 -08:00
087b9aa3dc mvcc: fix watchable store test for 3.2 cherrypick of #9281 2018-03-07 21:20:32 -08:00
b6373f1625 mvcc: restore unsynced watchers
In case syncWatchersLoop() starts before Restore() is called,
watchers already added by that moment are moved to s.synced by the loop.
However, there is a broken logic that moves watchers from s.synced
to s.uncyned without setting keyWatchers of the watcherGroup.
Eventually syncWatchers() fails to pickup those watchers from s.unsynced
and no events are sent to the watchers, because newWatcherBatch() called
in the function uses wg.watcherSetByKey() internally that requires
a proper keyWatchers value.
2018-03-07 21:20:21 -08:00
4178b75411 hack/scripts-dev: fix indentation in run.sh
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-03-07 14:31:25 -08:00
9c8e39e7f4 hack/scripts-dev: sync with master
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-03-07 14:25:10 -08:00
af3021aa1a semaphore: update Go version, release test version
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-03-07 14:24:57 -08:00
df0b652d6a travis: use docker, sync with master
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
2018-03-07 14:24:04 -08:00
8e5d62cf1e Merge pull request #8933 from gyuho/release-doc
release-3.1: scripts/build-docker: build both gcr.io and quay.io images
2017-11-30 12:49:51 -08:00
212d801294 scripts/build-docker: build both gcr.io and quay.io images
Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
2017-11-28 15:03:38 -08:00
c2d8d9fd26 version: bump up to 3.1.11+git 2017-11-28 14:43:37 -08:00
960f4604bc version: bump up to v3.1.11 2017-11-28 10:48:24 -08:00
22b67da920 Merge pull request #8902 from jpbetz/automated-cherry-pick-of-#8813-release-3.1
Automated cherry pick of #8813 release 3.1
2017-11-22 11:21:13 -08:00
4b53ab0909 test: Clean agent directories on disk before functional test runs, not after
This is primarily so CI tooling can capture the agent logs after the functional tester runs.
2017-11-21 11:41:57 -08:00
b32ec69f9b vendor: Switch from boltdb v1.3.0 to coreos/bbolt v1.3.1-coreos.3 2017-11-21 11:34:45 -08:00
3ab9894b04 Merge pull request #8806 from jpbetz/automated-cherry-pick-of-#8427-origin-release-3.1-1509570173
Automated cherry pick of #8427 to release 3.1 branch
2017-11-09 18:07:40 -08:00
00d6d4aba7 e2e: test against latest v3.1.x release
Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
2017-11-08 15:11:34 -08:00
88b5e22b73 semaphore: manually pin v3.1.10 for release upgrade test
Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
2017-11-08 12:46:15 -08:00
2bb8278fbf hack/scripts-dev: add Makefile, Dockerfile-test
Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
2017-11-06 14:13:51 -08:00
935c76b8c3 semaphore.sh: fail tests with "(--- FAIL:|leak)"
Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
2017-11-03 10:59:23 -07:00
e83f50ec7c mvcc: sending events after restore
Fixes: #8411
2017-11-02 17:15:35 -07:00
232a81d804 client/integration: use only digits in unix port
Fix https://github.com/coreos/etcd/issues/7558.

Same as https://github.com/coreos/etcd/issues/6959.

Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
2017-11-02 13:56:55 -07:00
6ffc32cd06 semaphore.sh: add to release-3.1 branch 2017-11-02 13:56:48 -07:00
f8aeb21c2d version: bump up to v3.1.10+git
Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
2017-11-02 13:56:16 -07:00
0520cb9304 version: bump up to 3.1.10
Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
2017-07-14 09:37:27 -07:00
424e4ae1cc fixtures: add gencerts.sh, generate CRL 2017-07-14 09:37:15 -07:00
a631a80a39 travis.yml: test with Go 1.8.3
Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
2017-07-14 08:51:24 -07:00
fc08fd75ee version: bump up to 3.1.9+git
Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
2017-07-14 08:51:13 -07:00
0f4a535c2f version: bump up to 3.1.9
Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
2017-06-09 10:44:38 -07:00
c765bef483 rafthttp: permit very large v2 snapshots
v2 snapshots were hitting the 512MB message decode limit, causing
sending snapshots to new members to fail for being too big.
2017-06-09 10:44:07 -07:00
5586a5806e version: bump up to 3.1.8+git
Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
2017-06-09 10:43:34 -07:00
d267ca9c18 version: bump up to 3.1.8
Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
2017-05-05 12:25:40 -07:00
4176fe768f *: fix other broken links in markdown
Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
2017-05-03 22:27:58 -07:00
950c846144 Documentation/v3: fix broken links
Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
2017-05-03 22:27:51 -07:00
0b78d66abe Documentation/v2: fix broken links
Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
2017-05-03 16:59:23 -07:00
2d58079626 integration: close accepted connection on stopc path
Connection pausing added another exit condition in the listener
path, causing the bridge to leak connections instead of closing
them when signalled to close. Also adds some additional Close
paranoia.

Fixes #7823
2017-05-03 14:51:47 -07:00
be171fa424 etcdserver: apply() sets consistIndex for any entry type
previously, apply() doesn't set consistIndex for EntryConfChange type.
this causes a misalignment between consistIndex and applied index
where EntryConfChange entry results setting applied index but not consistIndex.

suppose that addMember() is called and leader reflects that change.
1. applied index and consistIndex is now misaligned.
2. a new follower node joined.
3. leader sends the snapshot to follower
	where the applied index is the snapshot metadata index.
4. follower node saves the snapshot and database(includes consistIndex) from leader.
5. restarting follower loads snapshot and database.
6. follower checks snapshot metadata index(same as applied index) and database consistIndex,
	finds them don't match, and then panic.

FIXES #7834
2017-05-03 09:22:48 -07:00
4b60243fc5 etcd-2-1-0-bench: Fix an absolute bare link to resource outside of Documentation dir 2017-05-03 08:32:23 -07:00
2c5d79f49f Docs: replace absolute links with relative ones. 2017-05-03 08:32:15 -07:00
424abca6ac version: bump up to 3.1.7+git
Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
2017-05-03 08:31:53 -07:00
43b75072bf version: bump up to 3.1.7
Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
2017-04-27 08:25:50 -07:00
78141fae60 clientv3: set current revision to create rev regardless of CreateNotify
Turns out the optimization to ignore setting the init rev for
current revision watches breaks some ordering assumptions. Since
Watch only returns a channel once it gets a response, it should
bind the revision at the time of the first create response.

Was causing TestWatchReconnInit to fail.
2017-04-25 10:54:39 -07:00
3be37f042e integration: add pause/unpause to client bridge
Resetting connections sometimes isn't enough; need to stop/resume
accepting connections for some tests while keeping the member up.
2017-04-25 10:54:15 -07:00
7c896098d2 clientv3/integration: test watch resume with disconnect before first event 2017-04-25 10:53:58 -07:00
30f4e36de4 clientv3: only update initReq.rev == 0 with creation watch revision
Always updating the initReq.rev on watch create will resume from the wrong
revision if initReq is ever nonzero.
2017-04-25 10:53:37 -07:00
557abbe437 ctlv3: use printer for lease command results
Fixes #7783
2017-04-20 10:39:36 -07:00
4b448c209b version: bump up to 3.1.6+git
Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
2017-04-20 10:39:18 -07:00
e5b7ee2d03 version: bump up to 3.1.6
Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
2017-04-19 08:28:06 -07:00
a4c5731c38 ctlv3: keep lease as integer in fields printer
Output was giving %!d(string=) instead of the expected lease ID
value.
2017-04-19 08:27:52 -07:00
1f558ae678 integration: test auth API response header revision
Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
2017-04-17 20:06:14 -07:00
df93627bbb etcdserver: fill-in Auth API Header in apply layer
Replacing "etcdserver: fill a response header in auth RPCs"
The revision should be set at the time of "apply",
not in later RPC layer.

Fix https://github.com/coreos/etcd/issues/7691

Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
2017-04-17 20:06:08 -07:00
a20295c65b auth: fix race on stopping simple token keeper
run goroutine was resetting a field for no reason and without holding a lock.
This patch cleans up the run goroutine management to make the start/stop path
less racey in general.
2017-04-14 16:52:25 -07:00
9f7bb0df3a etcdserver: let Status() not require authentication
The information that can be obtained with the RPC doesn't need to be
protected.

Fix https://github.com/coreos/etcd/issues/7721
2017-04-13 15:56:26 -07:00
6a805e5222 test: do not run extra static checks on release branch
Things are usually already fixed in master branch
but not worth backporting.

Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
2017-04-13 14:44:22 -07:00
38f79fa565 clientv3: fix gofmt warnings
Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
2017-04-13 14:44:22 -07:00
37a502cc88 integration: test requests with valid auth token but disabled auth
etcd was crashing since auth was assuming a token implies auth is enabled.
2017-04-13 14:44:22 -07:00
9be7fc5320 auth: protect simpleToken with single mutex and check if enabled
Dual locking doesn't really give a convincing performance improvement and
the lock ordering makes it impossible to safely check if the TTL keeper
is enabled or not.

Fixes #7722
2017-04-13 14:44:16 -07:00
288bccd288 pkg/transport: remove port in Certificate.IPAddresses
etcd passes 'url.URL.Host' to 'SelfCert' which contains
client, peer port. 'net.ParseIP("127.0.0.1:2379")' returns
'nil', and the client on this self-cert will see errors
of '127.0.0.1 because it doesn't contain any IP SANs'

Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
2017-04-05 04:30:26 -07:00
8cb5b48f58 clientv3: test dial timeout is respected when using auth 2017-04-04 14:14:23 -07:00
6538217528 clientv3: respect dial timeout when authenticating
Fixes #7627
2017-04-04 14:12:32 -07:00
e983d6b343 version: bump up to 3.1.5+git
Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
2017-04-04 14:10:15 -07:00
20490caaf0 version: bump up to 3.1.5
Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
2017-03-27 10:20:28 -07:00
e156746959 raft: use rs.req.Entries[0].Data as the key for deletion in advance()
advance() should use rs.req.Entries[0].Data as the context instead of
req.Context for deletion. Since req.Context is never set, there won't be
any context being deleted from pendingReadIndex; results mem leak.

FIXES #7571
2017-03-24 15:51:39 -07:00
d84bf983cc Dockerfile-release: add nsswitch.conf into image
The file '/etc/nsswitch.conf' is created in order to
take in account '/etc/hosts' entries while resolving
domain names.
2017-03-23 15:20:49 -07:00
b44c6bff9d clientv3: use waitgroup to wait for substream goroutine teardown
When a grpc watch stream is torn down, it will join on its logical substream
goroutines by waiting for each to close a channel. This doesn't guarantee
the substream is fully exited, though, but only about to exit and can be
waiting to resume even after Watch.Close finishes. Instead, use a
waitgroup.Done at the very end of the substream defer.

Fixes #7573
2017-03-23 12:26:32 -07:00
8c3c1b4a9c *: use filepath.Join for files 2017-03-23 09:53:56 -07:00
b478387a59 wal: use path/filepath instead of path
Use the path/filepath package instead of the path package. The
path package assumes slash-separated paths, which doesn't work
on Windows. But path/filepath manipulates filename paths in a way
that's compatible across OSes.
2017-03-23 09:50:41 -07:00
dfc1f21f9d version: bump to 3.1.4+git
Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
2017-03-23 09:49:51 -07:00
41e52ebc22 version: bump to 3.1.4
Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
2017-03-22 09:46:23 -07:00
7bb538d4d4 backend: add FillPercent option 2017-03-21 12:12:32 -07:00
1622782e49 integration: ensure 'StopNotify' on publish error
Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
2017-03-21 12:12:13 -07:00
99b47e0c1e etcdmain: handle StopNotify when ErrStopped aborted publish
Fix https://github.com/coreos/etcd/issues/7512.

If a server starts and aborts due to config error,
it is possible to get stuck in ReadyNotify waits.
This adds select case to get notified on stop channel.

Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
2017-03-21 12:10:36 -07:00
350d0cd211 ctlv3: have "protobuf" in output help string instead of "proto"
Fixes #7538
2017-03-20 12:40:25 -07:00
72f37ff79a embed: Clear default initial cluster
NewConfig() should sets initial cluster from name but we should clear it
in the event that another discovery option has been specified.

Fixes #7516
2017-03-18 07:56:18 -07:00
3221454cab etcdserver: remove possibly compacted entry look-up
Fix https://github.com/coreos/etcd/issues/7470.

This patch removes unnecessary term look-up in
'createMergedSnapshotMessage', which can trigger panic
if raft entry at etcdProgress.appliedi got compacted
by subsequent 'MsgSnap' messages--if a follower is
being (in this case, network latency spikes) slow, it
could receive subsequent 'MsgSnap' requests from leader.

etcd server-side 'applyAll' routine and raft's Ready
processing routine becomes asynchronous after raft
entries are persisted. And given that raft Ready routine
takes less time to finish, it is possible that second
'MsgSnap' is being handled, while the slow 'applyAll'
is still processing the first(old) 'MsgSnap'. Then raft
Ready routine can compact the log entries at future
index to 'applyAll'. That is how 'createMergedSnapshotMessage'
tried to look up raft term with outdated etcdProgress.appliedi.

Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
2017-03-18 07:56:18 -07:00
4a1bffdbc6 clientv3: close open watch channel if substream is closing on reconnect
If substream is closing but outc is still open while reconnecting, then outc
would only be closed once the watch client would connect or once the watch
client is closed. This was leading to deadlocks in the proxy tests. Instead,
close immediately if the context is canceled.

Fixes #7503
2017-03-18 07:56:18 -07:00
9d9be2bc86 ctlv3: ensure synced member list before printing env vars on member add
In cases of multiple endpoints, it's possible member add would get a its
member list from a member that has not yet recognized the membership
update. Instead, confirm that the member list response is from the
member that acked the member add or from a member that has synced
with the cluster following the member add.

Fixes #7498
2017-03-18 07:56:18 -07:00
e5462f74f1 auth: get rid of deadlocking channel passing scheme in simpleTokenTTL
Cherry-picked from 1b1fabef8f.

Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
2017-03-18 07:56:05 -07:00
c68c1d9344 discovery: fix print format
Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
2017-03-17 14:21:57 -07:00
6ed56cd723 auth: nil check AuthInfo when checking admin permissions
If the context does not include auth information, get authinfo will
return a nil auth info and a nil error. This is then passed to
IsAdminPermitted, which would dereference the nil auth info.
2017-03-17 14:21:39 -07:00
a3c6f6bf81 version: bump up to 3.1.3+git
Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
2017-03-17 14:21:15 -07:00
21fdcc6443 version: bump up to 3.1.3
Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
2017-03-10 09:05:16 -08:00
8d122e7011 etcdmain: SdNotify when gateway, grpc-proxy are ready
Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
2017-03-09 11:35:20 -08:00
ade1d97893 lease: guard 'Lease.itemSet' from concurrent writes
Fix https://github.com/coreos/etcd/issues/7448.

Affected if etcd builds with Go 1.8+.

Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
2017-03-08 14:50:06 -08:00
1300189581 gateway: fix the dns discovery method
strip the scheme from the endpoints to have a clean hostname for TCP proxy

Fixes #7452
2017-03-08 14:49:50 -08:00
1971517806 etcdctl: correctly batch revisions in make-mirror
Fixes #7410
2017-03-06 14:55:47 -08:00
d614bb0799 etcdmain: log machine default host after update check
Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
2017-03-06 14:55:31 -08:00
059dc91d4c embed: use machine default host only for default value, 0.0.0.0
Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
2017-03-06 14:55:24 -08:00
5fdbaee761 version: bump up to 3.1.2+git 2017-02-24 10:34:53 -08:00
714e7ec8db version: bump up to 3.1.2 2017-02-22 10:45:48 -08:00
2cdaf6d661 netutil: use ipv4 host by default
Was non-deterministic.
2017-02-22 10:45:38 -08:00
77a51e0dbf pkg/netutil: name GetDefaultInterfaces consistent 2017-02-22 10:45:29 -08:00
d96d3aa0ed netutil: add dualstack to linux_route
in v3.1.0 netutil couldn't get default interface for ipv6only hosts

Fixes #7219
2017-02-22 10:45:19 -08:00
66e7532f57 pkg/netutil: use native byte ordering for route information
Fixes #7199
2017-02-22 10:45:07 -08:00
3eff360e79 pkg/cpuutil: add cpuutil
A package for unsafe cpu-ish things.
2017-02-22 10:44:59 -08:00
1487071966 integration: add 'TestV3HashRestart' 2017-02-22 10:39:49 -08:00
5d62bba9c7 auth: keep old revision in 'NewAuthStore'
When there's no changes yet (right after auth
store initialization), we should commit old revision.

Fix https://github.com/coreos/etcd/issues/7359.
2017-02-22 10:39:28 -08:00
114e293119 integration: test keepalives for short TTLs 2017-02-22 10:38:38 -08:00
1439955536 clientv3: do not set next keepalive time <= now+TTL 2017-02-22 10:38:28 -08:00
2c8ecc7e13 tcpproxy: don't use range variable in reactivate goroutine
Ends up trying to reactivate only the last endpoint.
2017-02-22 10:38:19 -08:00
7b4d622a7e raft: fix read index request for #7331 2017-02-22 10:38:09 -08:00
db8abbd975 version: bump to 3.1.1+git 2017-02-21 17:00:02 -08:00
ac1c7eba21 version: bump up to 3.1.1 2017-02-16 13:53:25 -08:00
9cc6d4852a travis: update for Go 1.7.5 tests 2017-02-16 13:53:05 -08:00
ff7fa9843d clientv3: fix lease keepalive duration 2017-02-16 13:52:27 -08:00
f66138d403 clientv3: fix lease keepalive duration 2017-02-16 12:41:09 -08:00
8c87916f68 auth: add 'setupAuthStore' to tests 2017-02-14 14:39:51 -08:00
9e81b002c4 auth: correct initialization in NewAuthStore()
Because of my own silly mistake, current NewAuthStore() doesn't
initialize authStore in a correct manner. For example, after recovery
from snapshot, it cannot revive the flag of enabled/disabled. This
commit fixes the problem.

Fix https://github.com/coreos/etcd/issues/7165
2017-02-14 13:49:19 -08:00
4962c5cff7 auth: add a test case for recoverying from snapshot
Conflicts:
	auth/store_test.go
2017-02-14 13:48:17 -08:00
e5bf25a3b6 e2e: add cases for defrag and snapshot with authentication 2017-02-14 13:43:11 -08:00
98c60e8faa auth, etcdserver: let maintenance services require root role
This commit lets maintenance services require root privilege. It also
moves AuthInfoFromCtx() from etcdserver to auth pkg for cleaning purpose.
2017-02-14 13:42:31 -08:00
3ac3fa6f3d travis: disable email notifications
Was spamming security@coreos.com
2017-02-14 12:55:02 -08:00
eaa8b9e155 clientv3: test closing client cancels blocking dials 2017-02-14 11:31:51 -08:00
ea2aae464d clientv3: use DialContext
Fixes #7216
2017-02-14 11:31:43 -08:00
776739ebc2 roadmap: update roadmap 2017-01-20 14:21:08 -08:00
a7a8a47ba0 README: remove ACI, update Go version 2017-01-20 14:21:00 -08:00
379f7ae10e op-guide: change grpc-proxy from 'pre' to alpha' 2017-01-20 13:25:21 -08:00
ead2d95914 version: bump to v3.1.0+git 2017-01-20 13:25:04 -08:00
8ba2897a21 version: bump to v3.1.0 2017-01-20 12:42:12 -08:00
bc31e27cb9 documentation: update build documentation 2017-01-20 11:20:54 -08:00
fce20a0b0b test: passed the test script arguments as the test function parameters 2017-01-20 11:15:01 -08:00
f10363fecd etcdctlv3: snapshot restore works with lease key 2017-01-20 10:06:49 -08:00
a7ec6c88fd pkg/flags: fixed prefix checking of the env variables 2017-01-20 09:57:00 -08:00
62c591d223 integration: test STM apply on concurrent deletion 2017-01-20 09:56:42 -08:00
5676226867 clientv3/concurrency: fix rev comparison on concurrent key deletion 2017-01-20 09:56:36 -08:00
898b9e608f Documentation: fix typo s/endpoint-health/endpoint health/ 2017-01-19 20:48:01 -08:00
b84be6b11b NEWS: fix date for v3.1 release 2017-01-19 17:56:35 -08:00
7a12d65528 Documentation: update experimental_apis for v3.1 release 2017-01-19 12:43:32 -08:00
53ac04b118 vendor: update 'golang.org/x/net' 2017-01-18 13:11:48 -08:00
fbcd5375b3 glide: update 'golang.org/x/net' 2017-01-18 13:11:13 -08:00
c2e8d06eec grpcproxy, etcdmain, integration: add close channel to kv proxy
ccache launches goroutines that need to be explicitly stopped.

Fixes #7158
2017-01-18 13:11:03 -08:00
6c8f1986c8 etcdserver: use ReqTimeout for linearized read
Fixes #7136
2017-01-17 17:31:31 -08:00
be9ae300c6 pkg/report: add nil checking for getTimeSeries 2017-01-17 13:23:57 -08:00
9ba3632614 Documentation: document upgrading to v3.1 2017-01-17 13:23:19 -08:00
c2d8b5a9e8 ctlv3: print cluster info after adding new member 2017-01-17 10:23:02 -08:00
340 changed files with 11350 additions and 3351 deletions

16
.semaphore.sh Executable file
View File

@ -0,0 +1,16 @@
#!/usr/bin/env bash
TEST_SUFFIX=$(date +%s | base64 | head -c 15)
TEST_OPTS="PASSES='build unit release integration_e2e functional' MANUAL_VER=v3.1.11"
if [ "$TEST_ARCH" == "386" ]; then
TEST_OPTS="GOARCH=386 PASSES='build unit integration_e2e'"
fi
docker run \
--rm \
--volume=`pwd`:/go/src/github.com/coreos/etcd \
gcr.io/etcd-development/etcd-test:go1.8.7 \
/bin/bash -c "${TEST_OPTS} ./test 2>&1 | tee test-${TEST_SUFFIX}.log"
! egrep "(--- FAIL:|panic: test timed out|appears to have leaked)" -B50 -A10 test-${TEST_SUFFIX}.log

View File

@ -1,25 +1,43 @@
dist: trusty
language: go
go_import_path: github.com/coreos/etcd
sudo: false
sudo: required
services: docker
go:
- 1.7.4
- tip
- "1.8.7"
- tip
notifications:
on_success: never
on_failure: never
env:
matrix:
- TARGET=amd64
- TARGET=arm64
- TARGET=arm
- TARGET=386
- TARGET=ppc64le
- TARGET=amd64
- TARGET=amd64-go-tip
- TARGET=darwin-amd64
- TARGET=windows-amd64
- TARGET=arm64
- TARGET=arm
- TARGET=386
- TARGET=ppc64le
matrix:
fast_finish: true
allow_failures:
- go: tip
- go: tip
env: TARGET=amd64-go-tip
exclude:
- go: "1.8.7"
env: TARGET=amd64-go-tip
- go: tip
env: TARGET=amd64
- go: tip
env: TARGET=darwin-amd64
- go: tip
env: TARGET=windows-amd64
- go: tip
env: TARGET=arm
- go: tip
@ -29,33 +47,43 @@ matrix:
- go: tip
env: TARGET=ppc64le
addons:
apt:
packages:
- libpcap-dev
- libaspell-dev
- libhunspell-dev
before_install:
- go get -v github.com/chzchzchz/goword
- go get -v honnef.co/go/simple/cmd/gosimple
- go get -v honnef.co/go/unused/cmd/unused
- if [[ $TRAVIS_GO_VERSION == 1.* ]]; then docker pull gcr.io/etcd-development/etcd-test:go${TRAVIS_GO_VERSION}; fi
# disable godep restore override
install:
- pushd cmd/etcd && go get -t -v ./... && popd
- pushd cmd/etcd && go get -t -v ./... && popd
script:
- echo "TRAVIS_GO_VERSION=${TRAVIS_GO_VERSION}"
- >
case "${TARGET}" in
amd64)
docker run --rm \
--volume=`pwd`:/go/src/github.com/coreos/etcd gcr.io/etcd-development/etcd-test:go${TRAVIS_GO_VERSION} \
/bin/bash -c "GOARCH=amd64 ./test"
;;
amd64-go-tip)
GOARCH=amd64 ./test
;;
darwin-amd64)
docker run --rm \
--volume=`pwd`:/go/src/github.com/coreos/etcd gcr.io/etcd-development/etcd-test:go${TRAVIS_GO_VERSION} \
/bin/bash -c "GO_BUILD_FLAGS='-a -v' GOOS=darwin GOARCH=amd64 ./build"
;;
windows-amd64)
docker run --rm \
--volume=`pwd`:/go/src/github.com/coreos/etcd gcr.io/etcd-development/etcd-test:go${TRAVIS_GO_VERSION} \
/bin/bash -c "GO_BUILD_FLAGS='-a -v' GOOS=windows GOARCH=amd64 ./build"
;;
386)
GOARCH=386 PASSES="build unit" ./test
docker run --rm \
--volume=`pwd`:/go/src/github.com/coreos/etcd gcr.io/etcd-development/etcd-test:go${TRAVIS_GO_VERSION} \
/bin/bash -c "GOARCH=386 PASSES='build unit' ./test"
;;
*)
# test building out of gopath
GO_BUILD_FLAGS="-a -v" GOPATH="" GOARCH="${TARGET}" ./build
docker run --rm \
--volume=`pwd`:/go/src/github.com/coreos/etcd gcr.io/etcd-development/etcd-test:go${TRAVIS_GO_VERSION} \
/bin/bash -c "GO_BUILD_FLAGS='-a -v' GOARCH='${TARGET}' ./build"
;;
esac

View File

@ -5,6 +5,12 @@ ADD etcdctl /usr/local/bin/
RUN mkdir -p /var/etcd/
RUN mkdir -p /var/lib/etcd/
# Alpine Linux doesn't use pam, which means that there is no /etc/nsswitch.conf,
# but Golang relies on /etc/nsswitch.conf to check the order of DNS resolving
# (see https://github.com/golang/go/commit/9dee7771f561cf6aee081c0af6658cc81fac3918)
# To fix this we just create /etc/nsswitch.conf and add the following line:
RUN echo 'hosts: files mdns4_minimal [NOTFOUND=return] dns mdns4' >> /etc/nsswitch.conf
EXPOSE 2379 2380
# Define default command.

57
Dockerfile-test Normal file
View File

@ -0,0 +1,57 @@
FROM ubuntu:16.10
RUN rm /bin/sh && ln -s /bin/bash /bin/sh
RUN echo 'debconf debconf/frontend select Noninteractive' | debconf-set-selections
RUN apt-get -y update \
&& apt-get -y install \
build-essential \
gcc \
apt-utils \
pkg-config \
software-properties-common \
apt-transport-https \
libssl-dev \
sudo \
bash \
curl \
wget \
tar \
git \
netcat \
libaspell-dev \
libhunspell-dev \
hunspell-en-us \
aspell-en \
shellcheck \
&& apt-get -y update \
&& apt-get -y upgrade \
&& apt-get -y autoremove \
&& apt-get -y autoclean
ENV GOROOT /usr/local/go
ENV GOPATH /go
ENV PATH ${GOPATH}/bin:${GOROOT}/bin:${PATH}
ENV GO_VERSION REPLACE_ME_GO_VERSION
ENV GO_DOWNLOAD_URL https://storage.googleapis.com/golang
RUN rm -rf ${GOROOT} \
&& curl -s ${GO_DOWNLOAD_URL}/go${GO_VERSION}.linux-amd64.tar.gz | tar -v -C /usr/local/ -xz \
&& mkdir -p ${GOPATH}/src ${GOPATH}/bin \
&& go version
RUN mkdir -p ${GOPATH}/src/github.com/coreos/etcd
WORKDIR ${GOPATH}/src/github.com/coreos/etcd
ADD ./scripts/install-marker.sh /tmp/install-marker.sh
RUN go get -v -u -tags spell github.com/chzchzchz/goword \
&& go get -v -u github.com/coreos/license-bill-of-materials \
&& go get -v -u honnef.co/go/tools/cmd/gosimple \
&& go get -v -u honnef.co/go/tools/cmd/unused \
&& go get -v -u honnef.co/go/tools/cmd/staticcheck \
&& go get -v -u github.com/wadey/gocovmerge \
&& go get -v -u github.com/gordonklaus/ineffassign \
&& /tmp/install-marker.sh amd64 \
&& rm -f /tmp/install-marker.sh \
&& curl -s https://codecov.io/bash >/codecov \
&& chmod 700 /codecov

View File

@ -49,4 +49,4 @@ Bootstrap another machine and use the [hey HTTP benchmark tool][hey] to send req
| 256 | 256 | all servers | 3061 | 119.3 |
[hey]: https://github.com/rakyll/hey
[hack-benchmark]: /hack/benchmark/
[hack-benchmark]: https://github.com/coreos/etcd/tree/master/hack/benchmark

View File

@ -69,4 +69,4 @@ Bootstrap another machine and use the [hey HTTP benchmark tool][hey] to send req
[hey]: https://github.com/rakyll/hey
[c7146bd5]: https://github.com/coreos/etcd/commits/c7146bd5f2c73716091262edc638401bb8229144
[etcd-2.1-benchmark]: etcd-2-1-0-alpha-benchmarks.md
[hack-benchmark]: /hack/benchmark/
[hack-benchmark]: ../../hack/benchmark/

View File

@ -39,4 +39,4 @@ The performance is nearly the same as the one with empty server handler.
The performance with empty server handler is not affected by one put. So the
performance downgrade should be caused by storage package.
[etcd-v3-benchmark]: /tools/benchmark/
[etcd-v3-benchmark]: ../../tools/benchmark/

View File

@ -1,8 +1,11 @@
# Experimental APIs and features
For the most part, the etcd project is stable, but we are still moving fast! We believe in the release fast philosophy. We want to get early feedback on features still in development and stabilizing. Thus, there are, and will be more, experimental features and APIs. We plan to improve these features based on the early feedback from the community, or abandon them if there is little interest, in the next few releases. If you are running a production system, please do not rely on any experimental features or APIs.
For the most part, the etcd project is stable, but we are still moving fast! We believe in the release fast philosophy. We want to get early feedback on features still in development and stabilizing. Thus, there are, and will be more, experimental features and APIs. We plan to improve these features based on the early feedback from the community, or abandon them if there is little interest, in the next few releases. Please do not rely on any experimental features or APIs in production environment.
## The current experimental API/features are:
- v3 auth API: expect to be stable in 3.1 release
- etcd gateway: expect to be stable in 3.1 release
- [gateway][gateway]: beta, to be stable in 3.2 release
- [gRPC proxy][grpc-proxy]: alpha, to be stable in 3.2 release
[gateway]: ../op-guide/gateway.md
[grpc-proxy]: ../op-guide/grpc_proxy.md

View File

@ -3,7 +3,7 @@
etcd uses the [capnslog][capnslog] library for logging application output categorized into *levels*. A log message's level is determined according to these conventions:
* Error: Data has been lost, a request has failed for a bad reason, or a required resource has been lost
* Examples:
* Examples:
* A failure to allocate disk space for WAL
* Warning: (Hopefully) Temporary conditions that may cause errors, but may work fine. A replica disappearing (that may reconnect) is a warning.
@ -26,4 +26,4 @@ etcd uses the [capnslog][capnslog] library for logging application output catego
* Send a normal message to a remote peer
* Write a log entry to disk
[capnslog]: [https://github.com/coreos/pkg/tree/master/capnslog]
[capnslog]: https://github.com/coreos/pkg/tree/master/capnslog

View File

@ -10,29 +10,44 @@ The easiest way to get etcd is to use one of the pre-built release binaries whic
## Build the latest version
For those wanting to try the very latest version, build etcd from the `master` branch.
[Go](https://golang.org/) version 1.6+ (with HTTP2 support) is required to build the latest version of etcd.
etcd vendors its dependency for official release binaries, while making vendoring optional to avoid import conflicts.
[`build` script][build-script] would automatically include the vendored dependencies from [`cmd`][cmd-directory] directory.
For those wanting to try the very latest version, build etcd from the `master` branch. [Go](https://golang.org/) version 1.7+ is required to build the latest version of etcd. To ensure etcd is built against well-tested libraries, etcd vendors its dependencies for official release binaries. However, etcd's vendoring is also optional to avoid potential import conflicts when embedding the etcd server or using the etcd client.
Here are the commands to build an etcd binary from the `master` branch:
First, confirm go 1.7+ is installed:
```
```sh
# go is required
$ go version
go version go1.6 darwin/amd64
go version go1.7.3 darwin/amd64
# GOPATH should be set correctly
$ echo $GOPATH
/Users/example/go
```
$ mkdir -p $GOPATH/src/github.com/coreos
$ cd $GOPATH/src/github.com/coreos
To build `etcd` from the `master` branch without a `GOPATH` using the official `build` script:
```sh
$ git clone https://github.com/coreos/etcd.git
$ cd etcd
$ ./build
$ ./bin/etcd
...
```
To build a vendored `etcd` from the `master` branch via `go get`:
```sh
# GOPATH should be set
$ echo $GOPATH
/Users/example/go
$ go get github.com/coreos/etcd/cmd/etcd
$ $GOPATH/bin/etcd
```
To build `etcd` from the `master` branch without vendoring (may not build due to upstream conflicts):
```sh
# GOPATH should be set
$ echo $GOPATH
/Users/example/go
$ go get github.com/coreos/etcd
$ $GOPATH/bin/etcd
```
## Test the installation

View File

@ -52,6 +52,7 @@ To learn more about the concepts and internals behind etcd, read the following p
- [Migrate applications from using API v2 to API v3][v2_migration]
- [Updating v2.3 to v3.0][v3_upgrade]
- [Updating v3.0 to v3.1][v31_upgrade]
## Frequently Asked Questions (FAQ)
@ -88,3 +89,4 @@ Answers to [common questions] about etcd.
[supported_platform]: op-guide/supported-platform.md
[experimental]: dev-guide/experimental_apis.md
[v3_upgrade]: upgrades/upgrade_3_0.md
[v31_upgrade]: upgrades/upgrade_3_1.md

View File

@ -475,5 +475,5 @@ To setup an etcd cluster with proxies of v2 API, please read the the [clustering
[proxy]: https://github.com/coreos/etcd/blob/release-2.3/Documentation/proxy.md
[clustering_etcd2]: https://github.com/coreos/etcd/blob/release-2.3/Documentation/clustering.md
[security-guide]: security.md
[tls-setup]: /hack/tls-setup
[tls-setup]: ../../hack/tls-setup
[gateway]: gateway.md

View File

@ -286,7 +286,7 @@ Follow the instructions when using these flags.
[build-cluster]: clustering.md#static
[reconfig]: runtime-configuration.md
[discovery]: clustering.md#discovery
[iana-ports]: https://www.iana.org/assignments/service-names-port-numbers/service-names-port-numbers.xhtml?search=etcd
[iana-ports]: http://www.iana.org/assignments/service-names-port-numbers/service-names-port-numbers.txt
[proxy]: ../v2/proxy.md
[restore]: ../v2/admin_guide.md#restoring-a-backup
[security]: security.md

View File

@ -57,7 +57,7 @@ sudo rkt run --net=default:IP=${NODE3} coreos.com/etcd:v3.0.6 -- -name=node3 -ad
Verify the cluster is healthy and can be reached.
```
ETCDCTL_API=3 etcdctl --endpoints=http://172.16.28.21:2379,http://172.16.28.22:2379,http://172.16.28.23:2379 endpoint-health
ETCDCTL_API=3 etcdctl --endpoints=http://172.16.28.21:2379,http://172.16.28.22:2379,http://172.16.28.23:2379 endpoint health
```
### DNS

View File

@ -1,6 +1,6 @@
# gRPC proxy
*This is a pre-alpha feature, we are looking for early feedback.*
*This is an alpha feature, we are looking for early feedback.*
The gRPC proxy is a stateless etcd reverse proxy operating at the gRPC layer (L7). The proxy is designed to reduce the total processing load on the core etcd cluster. For horizontal scalability, it coalesces watch and lease API requests. To protect the cluster against abusive clients, it caches key range requests.
@ -36,9 +36,9 @@ watch key A ^ ^ watch key A |
To effectively coalesce multiple client watchers into a single watcher, the gRPC proxy coalesces new `c-watchers` into an existing `s-watcher` when possible. This coalesced `s-watcher` may be out of sync with the etcd server due to network delays or buffered undelivered events. When the watch revision is unspecified, the gRPC proxy will not guarantee the `c-watcher` will start watching from the most recent store revision. For example, if a client watches from an etcd server with revision 1000, that watcher will begin at revision 1000. If a client watches from the gRPC proxy, may begin watching from revision 990.
Similar limitations apply to cancellation. When the watcher is cancelled, the etcd servers revision may be greater than the cancellation response revision.
Similar limitations apply to cancellation. When the watcher is cancelled, the etcd servers revision may be greater than the cancellation response revision.
These two limitations should not cause problems for most use cases. In the future, there may be additional options to force the watcher to bypass the gRPC proxy for more accurate revision responses.
These two limitations should not cause problems for most use cases. In the future, there may be additional options to force the watcher to bypass the gRPC proxy for more accurate revision responses.
## Scalable lease API
@ -75,3 +75,4 @@ $ ETCDCTL_API=3 ./etcdctl --endpoints=127.0.0.1:2379 get foo
foo
bar
```

View File

@ -219,6 +219,6 @@ Make sure to sign the certificates with a Subject Name the member's public IP ad
The certificate needs to be signed for the member's FQDN in its Subject Name, use Subject Alternative Names (short IP SANs) to add the IP address. The `etcd-ca` tool provides `--domain=` option for its `new-cert` command, and openssl can make [it][alt-name] too.
[cfssl]: https://github.com/cloudflare/cfssl
[tls-setup]: /hack/tls-setup
[tls-setup]: ../../hack/tls-setup
[tls-guide]: https://github.com/coreos/docs/blob/master/os/generate-self-signed-certificates.md
[alt-name]: http://wiki.cacert.org/FAQ/subjectAltName

View File

@ -50,7 +50,7 @@ Radius Intelligence uses Kubernetes running CoreOS to containerize and scale int
## Vonage
- *Application*: system configuration for microservices, scheduling, locks (future - service discovery)
- *Application*: system configuration for microservices, scheduling, locks (future - service discovery)
- *Launched*: August 2015
- *Cluster Size*: 2 clusters of 5 members in 2 DCs, n local proxies 1-to-1 with microservice, (ssl and SRV look up)
- *Order of Data Size*: kilobytes
@ -60,3 +60,148 @@ Radius Intelligence uses Kubernetes running CoreOS to containerize and scale int
[teamcity]: https://www.jetbrains.com/teamcity/
[raoofm]:https://github.com/raoofm
## Qiniu Cloud
- *Application*: system configuration for microservices, distributed locks
- *Launched*: Jan. 2016
- *Cluster Size*: 3 members each with several clusters
- *Order of Data Size*: kilobytes
- *Operator*: Pandora, chenchao@qiniu.com
- *Environment*: Baremetal
- *Backups*: None, all data can be recreated if necessary
## QingCloud
- *Application*: [QingCloud][qingcloud] appcenter cluster for service discovery as [metad][metad] backend.
- *Launched*: December 2016
- *Cluster Size*: 1 cluster of 3 members per user.
- *Order of Data Size*: kilobytes
- *Operator*: [yunify][yunify]
- *Environment*: QingCloud IaaS
- *Backups*: None, all data can be recreated if necessary.
[metad]:https://github.com/yunify/metad
[yunify]:https://github.com/yunify
[qingcloud]:https://qingcloud.com/
## Yandex
- *Application*: system configuration for services, service discovery
- *Launched*: March 2016
- *Cluster Size*: 3 clusters of 5 members
- *Order of Data Size*: several gigabytes
- *Operator*: Yandex; [nekto0n][nekto0n]
- *Environment*: Bare Metal
- *Backups*: None
[nekto0n]:https://github.com/nekto0n
## Tencent Games
- *Application*: Meta data and configuration data for service discovery, Kubernetes, etc.
- *Launched*: Jan. 2015
- *Cluster Size*: 3 members each with 10s of clusters
- *Order of Data Size*: 10s of Megabytes
- *Operator*: Tencent Game Operations Department
- *Environment*: Baremetal
- *Backups*: Periodic sync to backup server
In Tencent games, we use Docker and Kubernetes to deploy and run our applications, and use etcd to save meta data for service discovery, Kubernetes, etc.
## Hyper.sh
- *Application*: Kubernetes, distributed locks, etc.
- *Launched*: April 2016
- *Cluster Size*: 1 cluster of 3 members
- *Order of Data Size*: 10s of MB
- *Operator*: Hyper.sh
- *Environment*: Baremetal
- *Backups*: None, all data can be recreated if necessary.
In [hyper.sh][hyper.sh], the container service is backed by [hypernetes][hypernetes], a multi-tenant kubernetes distro. Moreover, we use etcd to coordinate the multiple manage services and store global meta data.
[hypernetes]:https://github.com/hyperhq/hypernetes
[Hyper.sh]:https://www.hyper.sh
## Meitu
- *Application*: system configuration for services, service discovery, kubernetes in test environment
- *Launched*: October 2015
- *Cluster Size*: 1 cluster of 3 members
- *Order of Data Size*: megabytes
- *Operator*: Meitu, hxj@meitu.com, [shafreeck][shafreeck]
- *Environment*: Bare Metal
- *Backups*: None, all data can be recreated if necessary.
[shafreeck]:https://github.com/shafreeck
## Grab
- *Application*: system configuration for services, service discovery
- *Launched*: June 2016
- *Cluster Size*: 1 cluster of 7 members
- *Order of Data Size*: megabytes
- *Operator*: Grab, [taxitan][taxitan], [reterVision][reterVision]
- *Environment*: AWS
- *Backups*: None, all data can be recreated if necessary.
[taxitan]:https://github.com/taxitan
[reterVision]:https://github.com/reterVision
## DaoCloud.io
- *Application*: container management
- *Launched*: Sep. 2015
- *Cluster Size*: 1000+ deployments, each deployment contains a 3 node cluster.
- *Order of Data Size*: 100s of Megabytes
- *Operator*: daocloud.io
- *Environment*: Baremetal and virtual machines
- *Backups*: None, all data can be recreated if necessary.
In [DaoCloud][DaoCloud], we use Docker and Swarm to deploy and run our applications, and we use etcd to save metadata for service discovery.
[DaoCloud]:https://www.daocloud.io
## Branch.io
- *Application*: Kubernetes
- *Launched*: April 2016
- *Cluster Size*: Multiple clusters, multiple sizes
- *Order of Data Size*: 100s of Megabytes
- *Operator*: branch.io
- *Environment*: AWS, Kubernetes
- *Backups*: EBS volume backups
At [Branch][branch], we use kubernetes heavily as our core microservice platform for staging and production.
[branch]: https://branch.io
## Baidu Waimai
- *Application*: SkyDNS, Kubernetes, UDC, CMDB and other distributed systems
- *Launched*: April. 2016
- *Cluster Size*: 3 clusters of 5 members
- *Order of Data Size*: several gigabytes
- *Operator*: Baidu Waimai Operations Department
- *Environment*: CentOS 6.5
- *Backups*: backup scripts
## Salesforce.com
- *Application*: Kubernetes
- *Launched*: Jan 2017
- *Cluster Size*: Multiple clusters of 3 members
- *Order of Data Size*: 100s of Megabytes
- *Operator*: Salesforce.com (krmayankk@github)
- *Environment*: BareMetal
- *Backups*: None, all data can be recreated
## Hosted Graphite
- *Application*: Service discovery, locking, ephemeral application data
- *Launched*: January 2017
- *Cluster Size*: 2 clusters of 7 members
- *Order of Data Size*: Megabytes
- *Operator*: Hosted Graphite (sre@hostedgraphite.com)
- *Environment*: Bare Metal
- *Backups*: None, all data is considered ephemeral.

View File

@ -1,6 +1,6 @@
# Reporting bugs
If any part of the etcd project has bugs or documentation mistakes, please let us know by [opening an issue][issue]. We treat bugs and mistakes very seriously and believe no issue is too small. Before creating a bug report, please check that an issue reporting the same problem does not already exist.
If any part of the etcd project has bugs or documentation mistakes, please let us know by [opening an issue][etcd-issue]. We treat bugs and mistakes very seriously and believe no issue is too small. Before creating a bug report, please check that an issue reporting the same problem does not already exist.
To make the bug report accurate and easy to understand, please try to create bug reports that are:

View File

@ -6,27 +6,27 @@ In the general case, upgrading from etcd 2.3 to 3.0 can be a zero-downtime, roll
Before [starting an upgrade](#upgrade-procedure), read through the rest of this guide to prepare.
### Upgrade Checklists
### Upgrade checklists
#### Upgrade Requirements
#### Upgrade requirements
To upgrade an existing etcd deployment to 3.0, the running cluster must be 2.3 or greater. If it's before 2.3, please upgrade to [2.3](https://github.com/coreos/etcd/releases/tag/v2.3.0) before upgrading to 3.0.
Also, to ensure a smooth rolling upgrade, the running cluster must be healthy. You can check the health of the cluster by using the `etcdctl cluster-health` command.
Also, to ensure a smooth rolling upgrade, the running cluster must be healthy. Check the health of the cluster by using the `etcdctl cluster-health` command before proceeding.
#### Preparation
Before upgrading etcd, always test the services relying on etcd in a staging environment before deploying the upgrade to the production environment.
Before beginning, [backup the etcd data directory](../v2/admin_guide.md#backing-up-the-datastore). Should something go wrong with the upgrade, it is possible to use this backup to [downgrade](#downgrade) back to existing etcd version.
Before beginning, [backup the etcd data directory](../v2/admin_guide.md#backing-up-the-datastore). Should something go wrong with the upgrade, it is possible to use this backup to [downgrade](#downgrade) back to existing etcd version.
#### Mixed Versions
#### Mixed versions
While upgrading, an etcd cluster supports mixed versions of etcd members, and operates with the protocol of the lowest common version. The cluster is only considered upgraded once all of its members are upgraded to version 3.0. Internally, etcd members negotiate with each other to determine the overall cluster version, which controls the reported version and the supported features.
#### Limitations
It might take up to 2 minutes for the newly upgraded member to catch up with the existing cluster when the total data size is larger than 50MB. Check the size of a recent snapshot to estimate the total data size. In other words, it is safest to wait for 2 minutes between upgrading each member.
It might take up to 2 minutes for the newly upgraded member to catch up with the existing cluster when the total data size is larger than 50MB. Check the size of a recent snapshot to estimate the total data size. In other words, it is safest to wait for 2 minutes between upgrading each member.
For a much larger total data size, 100MB or more , this one-time process might take even more time. Administrators of very large etcd clusters of this magnitude can feel free to contact the [etcd team][etcd-contact] before upgrading, and well be happy to provide advice on the procedure.
@ -36,13 +36,13 @@ If all members have been upgraded to v3.0, the cluster will be upgraded to v3.0,
Please [backup the data directory](../v2/admin_guide.md#backing-up-the-datastore) of all etcd members to make downgrading the cluster possible even after it has been completely upgraded.
### Upgrade Procedure
### Upgrade procedure
This example details the upgrade of a three-member v2.3 ectd cluster running on a local machine.
This example details the upgrade of a three-member v2.3 ectd cluster running on a local machine.
#### 1. Check upgrade requirements.
Is the the cluster healthy and running v.2.3.x?
Is the cluster healthy and running v.2.3.x?
```
$ etcdctl cluster-health
@ -64,7 +64,7 @@ When each etcd process is stopped, expected errors will be logged by other clust
2016-06-27 15:21:48.624175 I | rafthttp: the connection with 8211f1d0f64f3269 became inactive
```
Its a good idea at this point to [backup the etcd data directory](../v2/admin_guide.md#backing-up-the-datastore) to provide a downgrade path should any problems occur:
Its a good idea at this point to [backup the etcd data directory](../v2/admin_guide.md#backing-up-the-datastore) to provide a downgrade path should any problems occur:
```
$ etcdctl backup \
@ -102,7 +102,7 @@ Upgraded members will log warnings like the following until the entire cluster i
#### 5. Finish
When all members are upgraded, the cluster will report upgrading to 3.0 successfully:
When all members are upgraded, the cluster will report upgrading to 3.0 successfully:
```
2016-06-27 15:22:19.873751 N | membership: updated the cluster version from 2.3 to 3.0

View File

@ -0,0 +1,123 @@
## Upgrade etcd from 3.0 to 3.1
In the general case, upgrading from etcd 3.0 to 3.1 can be a zero-downtime, rolling upgrade:
- one by one, stop the etcd v3.0 processes and replace them with etcd v3.1 processes
- after running all v3.1 processes, new features in v3.1 are available to the cluster
Before [starting an upgrade](#upgrade-procedure), read through the rest of this guide to prepare.
### Upgrade checklists
#### Upgrade requirements
To upgrade an existing etcd deployment to 3.1, the running cluster must be 3.0 or greater. If it's before 3.0, please upgrade to [3.0](https://github.com/coreos/etcd/releases/tag/v3.0.16) before upgrading to 3.1.
Also, to ensure a smooth rolling upgrade, the running cluster must be healthy. Check the health of the cluster by using the `etcdctl endpoint health` command before proceeding.
#### Preparation
Before upgrading etcd, always test the services relying on etcd in a staging environment before deploying the upgrade to the production environment.
Before beginning, [backup the etcd data](../op-guide/maintenance.md#snapshot-backup). Should something go wrong with the upgrade, it is possible to use this backup to [downgrade](#downgrade) back to existing etcd version. Please note that the `snapshot` command only backs up the v3 data. For v2 data, see [backing up v2 datastore](../v2/admin_guide.md#backing-up-the-datastore).
#### Mixed versions
While upgrading, an etcd cluster supports mixed versions of etcd members, and operates with the protocol of the lowest common version. The cluster is only considered upgraded once all of its members are upgraded to version 3.1. Internally, etcd members negotiate with each other to determine the overall cluster version, which controls the reported version and the supported features.
#### Limitations
Note: If the cluster only has v3 data and no v2 data, it is not subject to this limitation.
If the cluster is serving a v2 data set larger than 50MB, each newly upgraded member may take up to two minutes to catch up with the existing cluster. Check the size of a recent snapshot to estimate the total data size. In other words, it is safest to wait for 2 minutes between upgrading each member.
For a much larger total data size, 100MB or more , this one-time process might take even more time. Administrators of very large etcd clusters of this magnitude can feel free to contact the [etcd team][etcd-contact] before upgrading, and we'll be happy to provide advice on the procedure.
#### Downgrade
If all members have been upgraded to v3.1, the cluster will be upgraded to v3.1, and downgrade from this completed state is **not possible**. If any single member is still v3.0, however, the cluster and its operations remains "v3.0", and it is possible from this mixed cluster state to return to using a v3.0 etcd binary on all members.
Please [backup the data directory](../op-guide/maintenance.md#snapshot-backup) of all etcd members to make downgrading the cluster possible even after it has been completely upgraded.
### Upgrade procedure
This example shows how to upgrade a 3-member v3.0 ectd cluster running on a local machine.
#### 1. Check upgrade requirements
Is the cluster healthy and running v3.0.x?
```
$ ETCDCTL_API=3 etcdctl endpoint health --endpoints=localhost:2379,localhost:22379,localhost:32379
localhost:2379 is healthy: successfully committed proposal: took = 6.600684ms
localhost:22379 is healthy: successfully committed proposal: took = 8.540064ms
localhost:32379 is healthy: successfully committed proposal: took = 8.763432ms
$ curl http://localhost:2379/version
{"etcdserver":"3.0.16","etcdcluster":"3.0.0"}
```
#### 2. Stop the existing etcd process
When each etcd process is stopped, expected errors will be logged by other cluster members. This is normal since a cluster member connection has been (temporarily) broken:
```
2017-01-17 09:34:18.352662 I | raft: raft.node: 1640829d9eea5cfb elected leader 1640829d9eea5cfb at term 5
2017-01-17 09:34:18.359630 W | etcdserver: failed to reach the peerURL(http://localhost:2380) of member fd32987dcd0511e0 (Get http://localhost:2380/version: dial tcp 127.0.0.1:2380: getsockopt: connection refused)
2017-01-17 09:34:18.359679 W | etcdserver: cannot get the version of member fd32987dcd0511e0 (Get http://localhost:2380/version: dial tcp 127.0.0.1:2380: getsockopt: connection refused)
2017-01-17 09:34:18.548116 W | rafthttp: lost the TCP streaming connection with peer fd32987dcd0511e0 (stream Message writer)
2017-01-17 09:34:19.147816 W | rafthttp: lost the TCP streaming connection with peer fd32987dcd0511e0 (stream MsgApp v2 writer)
2017-01-17 09:34:34.364907 W | etcdserver: failed to reach the peerURL(http://localhost:2380) of member fd32987dcd0511e0 (Get http://localhost:2380/version: dial tcp 127.0.0.1:2380: getsockopt: connection refused)
```
It's a good idea at this point to [backup the etcd data](../op-guide/maintenance.md#snapshot-backup) to provide a downgrade path should any problems occur:
```
$ etcdctl snapshot save backup.db
```
#### 3. Drop-in etcd v3.1 binary and start the new etcd process
The new v3.1 etcd will publish its information to the cluster:
```
2017-01-17 09:36:00.996590 I | etcdserver: published {Name:my-etcd-1 ClientURLs:[http://localhost:2379]} to cluster 46bc3ce73049e678
```
Verify that each member, and then the entire cluster, becomes healthy with the new v3.1 etcd binary:
```
$ ETCDCTL_API=3 /etcdctl endpoint health --endpoints=localhost:2379,localhost:22379,localhost:32379
localhost:22379 is healthy: successfully committed proposal: took = 5.540129ms
localhost:32379 is healthy: successfully committed proposal: took = 7.321671ms
localhost:2379 is healthy: successfully committed proposal: took = 10.629901ms
```
Upgraded members will log warnings like the following until the entire cluster is upgraded. This is expected and will cease after all etcd cluster members are upgraded to v3.1:
```
2017-01-17 09:36:38.406268 W | etcdserver: the local etcd version 3.0.16 is not up-to-date
2017-01-17 09:36:38.406295 W | etcdserver: member fd32987dcd0511e0 has a higher version 3.1.0
2017-01-17 09:36:42.407695 W | etcdserver: the local etcd version 3.0.16 is not up-to-date
2017-01-17 09:36:42.407730 W | etcdserver: member fd32987dcd0511e0 has a higher version 3.1.0
```
#### 4. Repeat step 2 to step 3 for all other members
#### 5. Finish
When all members are upgraded, the cluster will report upgrading to 3.1 successfully:
```
2017-01-17 09:37:03.100015 I | etcdserver: updating the cluster version from 3.0 to 3.1
2017-01-17 09:37:03.104263 N | etcdserver/membership: updated the cluster version from 3.0 to 3.1
2017-01-17 09:37:03.104374 I | etcdserver/api: enabled capabilities for version 3.1
```
```
$ ETCDCTL_API=3 /etcdctl endpoint health --endpoints=localhost:2379,localhost:22379,localhost:32379
localhost:2379 is healthy: successfully committed proposal: took = 2.312897ms
localhost:22379 is healthy: successfully committed proposal: took = 2.553476ms
localhost:32379 is healthy: successfully committed proposal: took = 2.516902ms
```
[etcd-contact]: https://groups.google.com/forum/#!forum/etcd-dev

View File

@ -67,13 +67,13 @@ You have successfully started an etcd and written a key to the store.
The [official etcd ports][iana-ports] are 2379 for client requests, and 2380 for peer communication. To maintain compatibility, some etcd configuration and documentation continues to refer to the legacy ports 4001 and 7001, but all new etcd use and discussion should adopt the IANA-assigned ports. The legacy ports 4001 and 7001 will be fully deprecated, and support for their use removed, in future etcd releases.
[iana-ports]: https://www.iana.org/assignments/service-names-port-numbers/service-names-port-numbers.xhtml?search=etcd
[iana-ports]: http://www.iana.org/assignments/service-names-port-numbers/service-names-port-numbers.txt
### Running local etcd cluster
First install [goreman](https://github.com/mattn/goreman), which manages Procfile-based applications.
Our [Procfile script](./Procfile) will set up a local example cluster. You can start it with:
Our [Procfile script](../../V2Procfile) will set up a local example cluster. You can start it with:
```sh
goreman start
@ -162,4 +162,4 @@ Currently only the amd64 architecture is officially supported by `etcd`.
### License
etcd is under the Apache 2.0 license. See the [LICENSE](LICENSE) file for details.
etcd is under the Apache 2.0 license. See the [LICENSE](../../LICENSE) file for details.

View File

@ -18,7 +18,7 @@ A keys lifetime spans a generation. Each key may have one or multiple generat
### Physical View
etcd stores the physical data as key-value pairs in a persistent [b+tree][b+tree]. Each revision of the stores state only contains the delta from its previous revision to be efficient. A single revision may correspond to multiple keys in the tree.
etcd stores the physical data as key-value pairs in a persistent [b+tree][b+tree]. Each revision of the stores state only contains the delta from its previous revision to be efficient. A single revision may correspond to multiple keys in the tree.
The key of key-value pair is a 3-tuple (major, sub, type). Major is the store revision holding the key. Sub differentiates among keys within the same revision. Type is an optional suffix for special value (e.g., `t` if the value contains a tombstone). The value of the key-value pair contains the modification from previous revision, thus one delta from previous revision. The b+tree is ordered by key in lexical byte-order. Ranged lookups over revision deltas are fast; this enables quickly finding modifications from one specific revision to another. Compaction removes out-of-date keys-value pairs.
@ -73,7 +73,7 @@ Any completed operations are durable. All accessible data is also durable data.
#### Linearizability
Linearizability (also known as Atomic Consistency or External Consistency) is a consistency level between strict consistency and sequential consistency.
Linearizability (also known as Atomic Consistency or External Consistency) is a consistency level between strict consistency and sequential consistency.
For linearizability, suppose each operation receives a timestamp from a loosely synchronized global clock. Operations are linearized if and only if they always complete as though they were executed in a sequential order and each operation appears to complete in the order specified by the program. Likewise, if an operations timestamp precedes another, that operation must also precede the other operation in the sequence.
@ -83,10 +83,10 @@ etcd does not ensure linearizability for watch operations. Users are expected to
etcd ensures linearizability for all other operations by default. Linearizability comes with a cost, however, because linearized requests must go through the Raft consensus process. To obtain lower latencies and higher throughput for read requests, clients can configure a requests consistency mode to `serializable`, which may access stale data with respect to quorum, but removes the performance penalty of linearized accesses' reliance on live consensus.
[persistent-ds]: [https://en.wikipedia.org/wiki/Persistent_data_structure]
[btree]: [https://en.wikipedia.org/wiki/B-tree]
[b+tree]: [https://en.wikipedia.org/wiki/B%2B_tree]
[seq_consistency]: [https://en.wikipedia.org/wiki/Consistency_model#Sequential_consistency]
[strict_consistency]: [https://en.wikipedia.org/wiki/Consistency_model#Strict_consistency]
[serializable_isolation]: [https://en.wikipedia.org/wiki/Isolation_(database_systems)#Serializable]
[Linearizability]: [#Linearizability]
[persistent-ds]: https://en.wikipedia.org/wiki/Persistent_data_structure
[btree]: https://en.wikipedia.org/wiki/B-tree
[b+tree]: https://en.wikipedia.org/wiki/B%2B_tree
[seq_consistency]: https://en.wikipedia.org/wiki/Consistency_model#Sequential_consistency
[strict_consistency]: https://en.wikipedia.org/wiki/Consistency_model#Strict_consistency
[serializable_isolation]: https://en.wikipedia.org/wiki/Isolation_(database_systems)#Serializable
[Linearizability]: #linearizability

View File

@ -32,7 +32,7 @@ The consistent flag for read operations is removed in etcd 2.0.0. The normal rea
The read consistency guarantees are:
The consistent read guarantees the sequential consistency within one client that talks to one etcd server. Read/Write from one client to one etcd member should be observed in order. If one client write a value to an etcd server successfully, it should be able to get the value out of the server immediately.
The consistent read guarantees the sequential consistency within one client that talks to one etcd server. Read/Write from one client to one etcd member should be observed in order. If one client write a value to an etcd server successfully, it should be able to get the value out of the server immediately.
Each etcd member will proxy the request to leader and only return the result to user after the result is applied on the local member. Thus after the write succeed, the user is guaranteed to see the value on the member it sent the request to.
@ -56,6 +56,7 @@ Proxy mode in 2.0 will provide similar functionality, and with improved control
## Discovery Service
A size key needs to be provided inside a [discovery token][discoverytoken].
[discoverytoken]: clustering.md#custom-etcd-discovery-service
## HTTP Admin API

View File

@ -49,4 +49,4 @@ Bootstrap another machine and use the [boom HTTP benchmark tool][boom] to send r
| 256 | 256 | all servers | 3061 | 119.3 |
[boom]: https://github.com/rakyll/boom
[hack-benchmark]: /hack/benchmark/
[hack-benchmark]: ../../../hack/benchmark/

View File

@ -24,7 +24,7 @@ Go OS/Arch: linux/amd64
## Testing
Bootstrap another machine, outside of the etcd cluster, and run the [`boom` HTTP benchmark tool](https://github.com/rakyll/boom) with a connection reuse patch to send requests to each etcd cluster member. See the [benchmark instructions](../../hack/benchmark/) for the patch and the steps to reproduce our procedures.
Bootstrap another machine, outside of the etcd cluster, and run the [`boom` HTTP benchmark tool][boom] with a connection reuse patch to send requests to each etcd cluster member. See the [benchmark instructions][hack] for the patch and the steps to reproduce our procedures.
The performance is calulated through results of 100 benchmark rounds.
@ -66,4 +66,7 @@ The performance is calulated through results of 100 benchmark rounds.
- Write QPS to cluster leaders seems to be increased by a small margin. This is because the main loop and entry apply loops were decoupled in the etcd raft logic, eliminating several blocks between them.
- Write QPS to all members seems to be increased by a significant margin, because followers now receive the latest commit index sooner, and commit proposals more quickly.
- Write QPS to all members seems to be increased by a significant margin, because followers now receive the latest commit index sooner, and commit proposals more quickly.
[boom]: https://github.com/rakyll/boom
[hack]: ../../../hack/benchmark/

View File

@ -69,4 +69,4 @@ Bootstrap another machine and use the [boom HTTP benchmark tool][boom] to send r
[boom]: https://github.com/rakyll/boom
[c7146bd5]: https://github.com/coreos/etcd/commits/c7146bd5f2c73716091262edc638401bb8229144
[etcd-2.1-benchmark]: etcd-2-1-0-alpha-benchmarks.md
[hack-benchmark]: /hack/benchmark/
[hack-benchmark]: ../../../hack/benchmark/

View File

@ -39,4 +39,4 @@ The performance is nearly the same as the one with empty server handler.
The performance with empty server handler is not affected by one put. So the
performance downgrade should be caused by storage package.
[etcd-v3-benchmark]: /tools/benchmark/
[etcd-v3-benchmark]: ../../../tools/benchmark/

View File

@ -423,7 +423,7 @@ To make understanding this feature easier, we changed the naming of some flags,
|-peers |none |Deprecated. The --initial-cluster flag provides a similar concept with different semantics. Please read this guide on cluster startup.|
|-peers-file |none |Deprecated. The --initial-cluster flag provides a similar concept with different semantics. Please read this guide on cluster startup.|
[client]: /client
[client]: ../../client
[client-discoverer]: https://godoc.org/github.com/coreos/etcd/client#Discoverer
[conf-adv-client]: configuration.md#-advertise-client-urls
[conf-listen-client]: configuration.md#-listen-client-urls

View File

@ -234,7 +234,7 @@ The security flags help to [build a secure etcd cluster][security].
+ env variable: ETCD_DEBUG
### --log-package-levels
+ Set individual etcd subpackages to specific log levels. An example being `etcdserver=WARNING,security=DEBUG`
+ Set individual etcd subpackages to specific log levels. An example being `etcdserver=WARNING,security=DEBUG`
+ default: none (INFO for all packages)
+ env variable: ETCD_LOG_PACKAGE_LEVELS
@ -272,7 +272,7 @@ Follow the instructions when using these flags.
[build-cluster]: clustering.md#static
[reconfig]: runtime-configuration.md
[discovery]: clustering.md#discovery
[iana-ports]: https://www.iana.org/assignments/service-names-port-numbers/service-names-port-numbers.xhtml?search=etcd
[iana-ports]: http://www.iana.org/assignments/service-names-port-numbers/service-names-port-numbers.txt
[proxy]: proxy.md
[reconfig]: runtime-configuration.md
[restore]: admin_guide.md#restoring-a-backup

View File

@ -112,7 +112,6 @@
- [mattn/etcdenv](https://github.com/mattn/etcdenv) - "env" shebang with etcd integration
- [kelseyhightower/confd](https://github.com/kelseyhightower/confd) - Manage local app config files using templates and data from etcd
- [configdb](https://git.autistici.org/ai/configdb/tree/master) - A REST relational abstraction on top of arbitrary database backends, aimed at storing configs and inventories.
- [scrz](https://github.com/scrz/scrz) - Container manager, stores configuration in etcd.
- [fleet](https://github.com/coreos/fleet) - Distributed init system
- [kubernetes/kubernetes](https://github.com/kubernetes/kubernetes) - Container cluster manager introduced by Google.
- [mailgun/vulcand](https://github.com/mailgun/vulcand) - HTTP proxy that uses etcd as a configuration backend.

View File

@ -1,6 +1,6 @@
# Reporting Bugs
If you find bugs or documentation mistakes in the etcd project, please let us know by [opening an issue][issue]. We treat bugs and mistakes very seriously and believe no issue is too small. Before creating a bug report, please check that an issue reporting the same problem does not already exist.
If you find bugs or documentation mistakes in the etcd project, please let us know by [opening an issue][etcd-issue]. We treat bugs and mistakes very seriously and believe no issue is too small. Before creating a bug report, please check that an issue reporting the same problem does not already exist.
To make your bug report accurate and easy to understand, please try to create bug reports that are:

View File

@ -7,25 +7,25 @@ To prove out the design of the v3 API the team has also built [a number of examp
# Design
1. Flatten binary key-value space
2. Keep the event history until compaction
- access to old version of keys
- user controlled history compaction
3. Support range query
- Pagination support with limit argument
- Support consistency guarantee across multiple range queries
4. Replace TTL key with Lease
- more efficient/ low cost keep alive
- a logical group of TTL keys
5. Replace CAS/CAD with multi-object Txn
- MUCH MORE powerful and flexible
6. Support efficient watching with multiple ranges
7. RPC API supports the completed set of APIs.
7. RPC API supports the completed set of APIs.
- more efficient than JSON/HTTP
- additional txn/lease support
@ -56,7 +56,7 @@ the size in the future a little bit or make it configurable.
// A put is always successful
Put( PutRequest { key = foo, value = bar } )
PutResponse {
PutResponse {
cluster_id = 0x1000,
member_id = 0x1,
revision = 1,
@ -119,7 +119,7 @@ RangeResponse {
Txn(TxnRequest {
// mod_revision of foo0 is equal to 1, mod_revision of foo1 is greater than 1
compare = {
{compareType = equal, key = foo0, mod_revision = 1},
{compareType = equal, key = foo0, mod_revision = 1},
{compareType = greater, key = foo1, mod_revision = 1}}
},
// if the comparison succeeds, put foo2 = bar2
@ -156,7 +156,7 @@ Watch( WatchRequest{
end_revision = 10000,
// server decided notification frequency
progress_notification = true,
}
}
… // this can be a watch request stream
)
@ -176,7 +176,7 @@ WatchResponse {
},
}
// a notification at 2000
WatchResponse {
cluster_id = 0x1000,
@ -185,9 +185,9 @@ WatchResponse {
raft_term = 0x1,
// nil event as notification
}
// put (foo0=bar3000) event at 3000
WatchResponse {
cluster_id = 0x1000,
@ -204,8 +204,8 @@ WatchResponse {
},
}
```
[api-protobuf]: https://github.com/coreos/etcd/blob/master/etcdserver/etcdserverpb/rpc.proto
[kv-protobuf]: https://github.com/coreos/etcd/blob/master/storage/storagepb/kv.proto
[api-protobuf]: https://github.com/coreos/etcd/blob/release-2.3/etcdserver/etcdserverpb/rpc.proto
[kv-protobuf]: https://github.com/coreos/etcd/blob/release-2.3/storage/storagepb/kv.proto

View File

@ -188,6 +188,6 @@ Make sure that you sign your certificates with a Subject Name your member's publ
If you need your certificate to be signed for your member's FQDN in its Subject Name then you could use Subject Alternative Names (short IP SANs) to add your IP address. The `etcd-ca` tool provides `--domain=` option for its `new-cert` command, and openssl can make [it][alt-name] too.
[cfssl]: https://github.com/cloudflare/cfssl
[tls-setup]: /hack/tls-setup
[tls-setup]: ../../hack/tls-setup
[tls-guide]: https://github.com/coreos/docs/blob/master/os/generate-self-signed-certificates.md
[alt-name]: http://wiki.cacert.org/FAQ/subjectAltName

2
NEWS
View File

@ -1,4 +1,4 @@
etcd v3.1.0 (2017-01-13)
etcd v3.1.0 (2017-01-20)
- faster linearizable reads (implements Raft read-index)
- automatic leadership transfer when leader steps down
- etcd uses default route IP if advertise URL is not given

View File

@ -37,13 +37,14 @@ See [etcdctl][etcdctl] for a simple command line client.
### Getting etcd
The easiest way to get etcd is to use one of the pre-built release binaries which are available for OSX, Linux, Windows, AppC (ACI), and Docker. Instructions for using these binaries are on the [GitHub releases page][github-release].
The easiest way to get etcd is to use one of the pre-built release binaries which are available for OSX, Linux, Windows, [rkt][rkt], and Docker. Instructions for using these binaries are on the [GitHub releases page][github-release].
For those wanting to try the very latest version, you can [build the latest version of etcd][dl-build] from the `master` branch.
You will first need [*Go*](https://golang.org/) installed on your machine (version 1.6+ is required).
You will first need [*Go*](https://golang.org/) installed on your machine (version 1.7+ is required).
All development occurs on `master`, including new features and bug fixes.
Bug fixes are first targeted at `master` and subsequently ported to release branches, as described in the [branch management][branch-management] guide.
[rkt]: https://github.com/coreos/rkt/releases/
[github-release]: https://github.com/coreos/etcd/releases/
[branch-management]: ./Documentation/branch_management.md
[dl-build]: ./Documentation/dl_build.md#build-the-latest-version
@ -77,7 +78,7 @@ That's it! etcd is now running and serving client requests. For more
The [official etcd ports][iana-ports] are 2379 for client requests, and 2380 for peer communication.
[iana-ports]: https://www.iana.org/assignments/service-names-port-numbers/service-names-port-numbers.xhtml?search=etcd
[iana-ports]: http://www.iana.org/assignments/service-names-port-numbers/service-names-port-numbers.txt
### Running a local etcd cluster
@ -135,5 +136,3 @@ See [reporting bugs](Documentation/reporting_bugs.md) for details about reportin
### License
etcd is under the Apache 2.0 license. See the [LICENSE](LICENSE) file for details.

View File

@ -6,25 +6,17 @@ This document defines a high level roadmap for etcd development.
The dates below should not be considered authoritative, but rather indicative of the projected timeline of the project. The [milestones defined in GitHub](https://github.com/coreos/etcd/milestones) represent the most up-to-date and issue-for-issue plans.
etcd 3.0 is our current stable branch. The roadmap below outlines new features that will be added to etcd, and while subject to change, define what future stable will look like.
etcd 3.1 is our current stable branch. The roadmap below outlines new features that will be added to etcd, and while subject to change, define what future stable will look like.
### etcd 3.1 (2016-Oct)
- Stable L4 gateway
- Experimental support for scalable proxy
- Automatic leadership transfer for the rolling upgrade
- V3 API improvements
- Get previous key-value pair
- Get only keys (ignore values)
- Get only key count
### etcd 3.2 (2017-Apr)
### etcd 3.2 (2017-May)
- Stable scalable proxy
- Proxy-as-client interface passthrough
- Lock service
- Namespacing proxy
- JWT token based authentication
- TLS Command Name and JWT token based authentication
- Read-modify-write V3 Put
- Improved watch performance
- Support non-blocking concurrent read
### etcd 3.3 (?)
- TBD

View File

@ -21,85 +21,92 @@ import (
"crypto/rand"
"math/big"
"strings"
"sync"
"time"
)
const (
letters = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
defaultSimpleTokenLength = 16
)
// var for testing purposes
var (
simpleTokenTTL = 5 * time.Minute
simpleTokenTTLResolution = 1 * time.Second
)
type simpleTokenTTLKeeper struct {
tokens map[string]time.Time
addSimpleTokenCh chan string
resetSimpleTokenCh chan string
deleteSimpleTokenCh chan string
stopCh chan chan struct{}
deleteTokenFunc func(string)
}
func NewSimpleTokenTTLKeeper(deletefunc func(string)) *simpleTokenTTLKeeper {
stk := &simpleTokenTTLKeeper{
tokens: make(map[string]time.Time),
addSimpleTokenCh: make(chan string, 1),
resetSimpleTokenCh: make(chan string, 1),
deleteSimpleTokenCh: make(chan string, 1),
stopCh: make(chan chan struct{}),
deleteTokenFunc: deletefunc,
}
go stk.run()
return stk
tokens map[string]time.Time
donec chan struct{}
stopc chan struct{}
deleteTokenFunc func(string)
mu *sync.Mutex
}
func (tm *simpleTokenTTLKeeper) stop() {
waitCh := make(chan struct{})
tm.stopCh <- waitCh
<-waitCh
close(tm.stopCh)
select {
case tm.stopc <- struct{}{}:
case <-tm.donec:
}
<-tm.donec
}
func (tm *simpleTokenTTLKeeper) addSimpleToken(token string) {
tm.addSimpleTokenCh <- token
tm.tokens[token] = time.Now().Add(simpleTokenTTL)
}
func (tm *simpleTokenTTLKeeper) resetSimpleToken(token string) {
tm.resetSimpleTokenCh <- token
if _, ok := tm.tokens[token]; ok {
tm.tokens[token] = time.Now().Add(simpleTokenTTL)
}
}
func (tm *simpleTokenTTLKeeper) deleteSimpleToken(token string) {
tm.deleteSimpleTokenCh <- token
delete(tm.tokens, token)
}
func (tm *simpleTokenTTLKeeper) run() {
tokenTicker := time.NewTicker(simpleTokenTTLResolution)
defer tokenTicker.Stop()
defer func() {
tokenTicker.Stop()
close(tm.donec)
}()
for {
select {
case t := <-tm.addSimpleTokenCh:
tm.tokens[t] = time.Now().Add(simpleTokenTTL)
case t := <-tm.resetSimpleTokenCh:
if _, ok := tm.tokens[t]; ok {
tm.tokens[t] = time.Now().Add(simpleTokenTTL)
}
case t := <-tm.deleteSimpleTokenCh:
delete(tm.tokens, t)
case <-tokenTicker.C:
nowtime := time.Now()
tm.mu.Lock()
for t, tokenendtime := range tm.tokens {
if nowtime.After(tokenendtime) {
tm.deleteTokenFunc(t)
delete(tm.tokens, t)
}
}
case waitCh := <-tm.stopCh:
tm.tokens = make(map[string]time.Time)
waitCh <- struct{}{}
tm.mu.Unlock()
case <-tm.stopc:
return
}
}
}
func (as *authStore) enable() {
delf := func(tk string) {
if username, ok := as.simpleTokens[tk]; ok {
plog.Infof("deleting token %s for user %s", tk, username)
delete(as.simpleTokens, tk)
}
}
as.simpleTokenKeeper = &simpleTokenTTLKeeper{
tokens: make(map[string]time.Time),
donec: make(chan struct{}),
stopc: make(chan struct{}),
deleteTokenFunc: delf,
mu: &as.simpleTokensMu,
}
go as.simpleTokenKeeper.run()
}
func (as *authStore) GenSimpleToken() (string, error) {
ret := make([]byte, defaultSimpleTokenLength)
@ -117,7 +124,6 @@ func (as *authStore) GenSimpleToken() (string, error) {
func (as *authStore) assignSimpleTokenToUser(username, token string) {
as.simpleTokensMu.Lock()
_, ok := as.simpleTokens[token]
if ok {
plog.Panicf("token %s is alredy used", token)
@ -129,13 +135,15 @@ func (as *authStore) assignSimpleTokenToUser(username, token string) {
}
func (as *authStore) invalidateUser(username string) {
if as.simpleTokenKeeper == nil {
return
}
as.simpleTokensMu.Lock()
defer as.simpleTokensMu.Unlock()
for token, name := range as.simpleTokens {
if strings.Compare(name, username) == 0 {
delete(as.simpleTokens, token)
as.simpleTokenKeeper.deleteSimpleToken(token)
}
}
as.simpleTokensMu.Unlock()
}

View File

@ -20,6 +20,7 @@ import (
"errors"
"fmt"
"sort"
"strconv"
"strings"
"sync"
@ -29,6 +30,7 @@ import (
"github.com/coreos/pkg/capnslog"
"golang.org/x/crypto/bcrypt"
"golang.org/x/net/context"
"google.golang.org/grpc/metadata"
)
var (
@ -57,6 +59,7 @@ var (
ErrPermissionNotGranted = errors.New("auth: permission is not granted to the role")
ErrAuthNotEnabled = errors.New("auth: authentication is not enabled")
ErrAuthOldRevision = errors.New("auth: revision in header is old")
ErrInvalidAuthToken = errors.New("auth: invalid auth token")
// BcryptCost is the algorithm cost / strength for hashing auth passwords
BcryptCost = bcrypt.DefaultCost
@ -153,6 +156,9 @@ type AuthStore interface {
// Close does cleanup of AuthStore
Close() error
// AuthInfoFromCtx gets AuthInfo from gRPC's context
AuthInfoFromCtx(ctx context.Context) (*AuthInfo, error)
}
type authStore struct {
@ -162,11 +168,24 @@ type authStore struct {
rangePermCache map[string]*unifiedRangePermissions // username -> unifiedRangePermissions
simpleTokensMu sync.RWMutex
simpleTokens map[string]string // token -> username
simpleTokenKeeper *simpleTokenTTLKeeper
revision uint64
// tokenSimple in v3.2+
indexWaiter func(uint64) <-chan struct{}
simpleTokenKeeper *simpleTokenTTLKeeper
simpleTokensMu sync.Mutex
simpleTokens map[string]string // token -> username
}
func newDeleterFunc(as *authStore) func(string) {
return func(t string) {
as.simpleTokensMu.Lock()
defer as.simpleTokensMu.Unlock()
if username, ok := as.simpleTokens[t]; ok {
plog.Infof("deleting token %s for user %s", t, username)
delete(as.simpleTokens, t)
}
}
}
func (as *authStore) AuthEnable() error {
@ -196,16 +215,7 @@ func (as *authStore) AuthEnable() error {
tx.UnsafePut(authBucketName, enableFlagKey, authEnabled)
as.enabled = true
tokenDeleteFunc := func(t string) {
as.simpleTokensMu.Lock()
defer as.simpleTokensMu.Unlock()
if username, ok := as.simpleTokens[t]; ok {
plog.Infof("deleting token %s for user %s", t, username)
delete(as.simpleTokens, t)
}
}
as.simpleTokenKeeper = NewSimpleTokenTTLKeeper(tokenDeleteFunc)
as.enable()
as.rangePermCache = make(map[string]*unifiedRangePermissions)
@ -233,11 +243,12 @@ func (as *authStore) AuthDisable() {
as.enabled = false
as.simpleTokensMu.Lock()
tk := as.simpleTokenKeeper
as.simpleTokenKeeper = nil
as.simpleTokens = make(map[string]string) // invalidate all tokens
as.simpleTokensMu.Unlock()
if as.simpleTokenKeeper != nil {
as.simpleTokenKeeper.stop()
as.simpleTokenKeeper = nil
if tk != nil {
tk.stop()
}
plog.Noticef("Authentication disabled")
@ -635,13 +646,14 @@ func (as *authStore) RoleAdd(r *pb.AuthRoleAddRequest) (*pb.AuthRoleAddResponse,
}
func (as *authStore) AuthInfoFromToken(token string) (*AuthInfo, bool) {
as.simpleTokensMu.RLock()
defer as.simpleTokensMu.RUnlock()
t, ok := as.simpleTokens[token]
if ok {
// same as '(t *tokenSimple) info' in v3.2+
as.simpleTokensMu.Lock()
username, ok := as.simpleTokens[token]
if ok && as.simpleTokenKeeper != nil {
as.simpleTokenKeeper.resetSimpleToken(token)
}
return &AuthInfo{Username: t, Revision: as.revision}, ok
as.simpleTokensMu.Unlock()
return &AuthInfo{Username: username, Revision: as.revision}, ok
}
type permSlice []*authpb.Permission
@ -753,6 +765,9 @@ func (as *authStore) IsAdminPermitted(authInfo *AuthInfo) error {
if !as.isAuthEnabled() {
return nil
}
if authInfo == nil {
return ErrUserEmpty
}
tx := as.be.BatchTx()
tx.Lock()
@ -871,7 +886,7 @@ func (as *authStore) isAuthEnabled() bool {
return as.enabled
}
func NewAuthStore(be backend.Backend) *authStore {
func NewAuthStore(be backend.Backend, indexWaiter func(uint64) <-chan struct{}) *authStore {
tx := be.BatchTx()
tx.Lock()
@ -879,13 +894,30 @@ func NewAuthStore(be backend.Backend) *authStore {
tx.UnsafeCreateBucket(authUsersBucketName)
tx.UnsafeCreateBucket(authRolesBucketName)
as := &authStore{
be: be,
simpleTokens: make(map[string]string),
revision: 0,
enabled := false
_, vs := tx.UnsafeRange(authBucketName, enableFlagKey, nil, 0)
if len(vs) == 1 {
if bytes.Equal(vs[0], authEnabled) {
enabled = true
}
}
as.commitRevision(tx)
as := &authStore{
be: be,
simpleTokens: make(map[string]string),
revision: getRevision(tx),
indexWaiter: indexWaiter,
enabled: enabled,
rangePermCache: make(map[string]*unifiedRangePermissions),
}
if enabled {
as.enable()
}
if as.revision == 0 {
as.commitRevision(tx)
}
tx.Unlock()
be.ForceCommit()
@ -912,7 +944,8 @@ func (as *authStore) commitRevision(tx backend.BatchTx) {
func getRevision(tx backend.BatchTx) uint64 {
_, vs := tx.UnsafeRange(authBucketName, []byte(revisionKey), nil, 0)
if len(vs) != 1 {
plog.Panicf("failed to get the key of auth store revision")
// this can happen in the initialization phase
return 0
}
return binary.BigEndian.Uint64(vs[0])
@ -921,3 +954,46 @@ func getRevision(tx backend.BatchTx) uint64 {
func (as *authStore) Revision() uint64 {
return as.revision
}
func (as *authStore) isValidSimpleToken(token string, ctx context.Context) bool {
splitted := strings.Split(token, ".")
if len(splitted) != 2 {
return false
}
index, err := strconv.Atoi(splitted[1])
if err != nil {
return false
}
select {
case <-as.indexWaiter(uint64(index)):
return true
case <-ctx.Done():
}
return false
}
func (as *authStore) AuthInfoFromCtx(ctx context.Context) (*AuthInfo, error) {
md, ok := metadata.FromContext(ctx)
if !ok {
return nil, nil
}
ts, tok := md["token"]
if !tok {
return nil, nil
}
token := ts[0]
if !as.isValidSimpleToken(token, ctx) {
return nil, ErrInvalidAuthToken
}
authInfo, uok := as.AuthInfoFromToken(token)
if !uok {
plog.Warningf("invalid auth token: %s", token)
return nil, ErrInvalidAuthToken
}
return authInfo, nil
}

View File

@ -26,31 +26,38 @@ import (
func init() { BcryptCost = bcrypt.MinCost }
func TestUserAdd(t *testing.T) {
b, tPath := backend.NewDefaultTmpBackend()
defer func() {
b.Close()
os.Remove(tPath)
func dummyIndexWaiter(index uint64) <-chan struct{} {
ch := make(chan struct{})
go func() {
ch <- struct{}{}
}()
return ch
}
as := NewAuthStore(b)
ua := &pb.AuthUserAddRequest{Name: "foo"}
_, err := as.UserAdd(ua) // add a non-existing user
// TestNewAuthStoreRevision ensures newly auth store
// keeps the old revision when there are no changes.
func TestNewAuthStoreRevision(t *testing.T) {
b, tPath := backend.NewDefaultTmpBackend()
defer os.Remove(tPath)
as := NewAuthStore(b, dummyIndexWaiter)
err := enableAuthAndCreateRoot(as)
if err != nil {
t.Fatal(err)
}
_, err = as.UserAdd(ua) // add an existing user
if err == nil {
t.Fatalf("expected %v, got %v", ErrUserAlreadyExist, err)
}
if err != ErrUserAlreadyExist {
t.Fatalf("expected %v, got %v", ErrUserAlreadyExist, err)
}
old := as.Revision()
b.Close()
as.Close()
ua = &pb.AuthUserAddRequest{Name: ""}
_, err = as.UserAdd(ua) // add a user with empty name
if err != ErrUserEmpty {
t.Fatal(err)
// no changes to commit
b2 := backend.NewDefaultBackend(tPath)
as = NewAuthStore(b2, dummyIndexWaiter)
new := as.Revision()
b2.Close()
as.Close()
if old != new {
t.Fatalf("expected revision %d, got %d", old, new)
}
}
@ -80,7 +87,7 @@ func TestCheckPassword(t *testing.T) {
os.Remove(tPath)
}()
as := NewAuthStore(b)
as := NewAuthStore(b, dummyIndexWaiter)
defer as.Close()
err := enableAuthAndCreateRoot(as)
if err != nil {
@ -125,7 +132,7 @@ func TestUserDelete(t *testing.T) {
os.Remove(tPath)
}()
as := NewAuthStore(b)
as := NewAuthStore(b, dummyIndexWaiter)
defer as.Close()
err := enableAuthAndCreateRoot(as)
if err != nil {
@ -162,7 +169,7 @@ func TestUserChangePassword(t *testing.T) {
os.Remove(tPath)
}()
as := NewAuthStore(b)
as := NewAuthStore(b, dummyIndexWaiter)
defer as.Close()
err := enableAuthAndCreateRoot(as)
if err != nil {
@ -208,7 +215,7 @@ func TestRoleAdd(t *testing.T) {
os.Remove(tPath)
}()
as := NewAuthStore(b)
as := NewAuthStore(b, dummyIndexWaiter)
defer as.Close()
err := enableAuthAndCreateRoot(as)
if err != nil {
@ -229,7 +236,7 @@ func TestUserGrant(t *testing.T) {
os.Remove(tPath)
}()
as := NewAuthStore(b)
as := NewAuthStore(b, dummyIndexWaiter)
defer as.Close()
err := enableAuthAndCreateRoot(as)
if err != nil {
@ -261,4 +268,93 @@ func TestUserGrant(t *testing.T) {
if err != ErrUserNotFound {
t.Fatalf("expected %v, got %v", ErrUserNotFound, err)
}
// non-admin user
err = as.IsAdminPermitted(&AuthInfo{Username: "foo", Revision: 1})
if err != ErrPermissionDenied {
t.Errorf("expected %v, got %v", ErrPermissionDenied, err)
}
// disabled auth should return nil
as.AuthDisable()
err = as.IsAdminPermitted(&AuthInfo{Username: "root", Revision: 1})
if err != nil {
t.Errorf("expected nil, got %v", err)
}
}
func TestRecoverFromSnapshot(t *testing.T) {
as, _ := setupAuthStore(t)
ua := &pb.AuthUserAddRequest{Name: "foo"}
_, err := as.UserAdd(ua) // add an existing user
if err == nil {
t.Fatalf("expected %v, got %v", ErrUserAlreadyExist, err)
}
if err != ErrUserAlreadyExist {
t.Fatalf("expected %v, got %v", ErrUserAlreadyExist, err)
}
ua = &pb.AuthUserAddRequest{Name: ""}
_, err = as.UserAdd(ua) // add a user with empty name
if err != ErrUserEmpty {
t.Fatal(err)
}
as.Close()
as2 := NewAuthStore(as.be, dummyIndexWaiter)
defer func(a *authStore) {
a.Close()
}(as2)
if !as2.isAuthEnabled() {
t.Fatal("recovering authStore from existing backend failed")
}
ul, err := as.UserList(&pb.AuthUserListRequest{})
if err != nil {
t.Fatal(err)
}
if !contains(ul.Users, "root") {
t.Errorf("expected %v in %v", "root", ul.Users)
}
}
func contains(array []string, str string) bool {
for _, s := range array {
if s == str {
return true
}
}
return false
}
func setupAuthStore(t *testing.T) (store *authStore, teardownfunc func(t *testing.T)) {
b, tPath := backend.NewDefaultTmpBackend()
as := NewAuthStore(b, dummyIndexWaiter)
err := enableAuthAndCreateRoot(as)
if err != nil {
t.Fatal(err)
}
// adds a new role
_, err = as.RoleAdd(&pb.AuthRoleAddRequest{Name: "role-test"})
if err != nil {
t.Fatal(err)
}
ua := &pb.AuthUserAddRequest{Name: "foo", Password: "bar"}
_, err = as.UserAdd(ua) // add a non-existing user
if err != nil {
t.Fatal(err)
}
tearDown := func(t *testing.T) {
b.Close()
os.Remove(tPath)
as.Close()
}
return as, tearDown
}

View File

@ -34,7 +34,7 @@ import (
func TestV2NoRetryEOF(t *testing.T) {
defer testutil.AfterTest(t)
// generate an EOF response; specify address so appears first in sorted ep list
lEOF := integration.NewListenerWithAddr(t, fmt.Sprintf("eof:123.%d.sock", os.Getpid()))
lEOF := integration.NewListenerWithAddr(t, fmt.Sprintf("127.0.0.1:%05d", os.Getpid()))
defer lEOF.Close()
tries := uint32(0)
go func() {
@ -65,8 +65,7 @@ func TestV2NoRetryEOF(t *testing.T) {
// TestV2NoRetryNoLeader tests destructive api calls won't retry if given an error code.
func TestV2NoRetryNoLeader(t *testing.T) {
defer testutil.AfterTest(t)
lHttp := integration.NewListenerWithAddr(t, fmt.Sprintf("errHttp:123.%d.sock", os.Getpid()))
lHttp := integration.NewListenerWithAddr(t, fmt.Sprintf("127.0.0.1:%05d", os.Getpid()))
eh := &errHandler{errCode: http.StatusServiceUnavailable}
srv := httptest.NewUnstartedServer(eh)
defer lHttp.Close()

View File

@ -221,7 +221,8 @@ func (c *Client) dialSetupOpts(endpoint string, dopts ...grpc.DialOption) (opts
return nil, c.ctx.Err()
default:
}
return net.DialTimeout(proto, host, t)
dialer := &net.Dialer{Timeout: t}
return dialer.DialContext(c.ctx, proto, host)
}
opts = append(opts, grpc.WithDialer(f))
@ -281,8 +282,16 @@ func (c *Client) dial(endpoint string, dopts ...grpc.DialOption) (*grpc.ClientCo
tokenMu: &sync.RWMutex{},
}
err := c.getToken(context.TODO())
if err != nil {
ctx := c.ctx
if c.cfg.DialTimeout > 0 {
cctx, cancel := context.WithTimeout(ctx, c.cfg.DialTimeout)
defer cancel()
ctx = cctx
}
if err := c.getToken(ctx); err != nil {
if err == ctx.Err() && ctx.Err() != c.ctx.Err() {
err = grpc.ErrClientConnTimeout
}
return nil, err
}
@ -334,6 +343,8 @@ func newClient(cfg *Config) (*Client, error) {
client.balancer = newSimpleBalancer(cfg.Endpoints)
conn, err := client.dial(cfg.Endpoints[0], grpc.WithBalancer(client.balancer))
if err != nil {
client.cancel()
client.balancer.Close()
return nil, err
}
client.conn = conn
@ -352,6 +363,7 @@ func newClient(cfg *Config) (*Client, error) {
}
if !hasConn {
client.cancel()
client.balancer.Close()
conn.Close()
return nil, grpc.ErrClientConnTimeout
}

View File

@ -16,6 +16,7 @@ package clientv3
import (
"fmt"
"net"
"testing"
"time"
@ -25,36 +26,89 @@ import (
"google.golang.org/grpc"
)
func TestDialTimeout(t *testing.T) {
func TestDialCancel(t *testing.T) {
defer testutil.AfterTest(t)
donec := make(chan error)
go func() {
// without timeout, grpc keeps redialing if connection refused
cfg := Config{
Endpoints: []string{"localhost:12345"},
DialTimeout: 2 * time.Second}
c, err := New(cfg)
if c != nil || err == nil {
t.Errorf("new client should fail")
}
donec <- err
}()
time.Sleep(10 * time.Millisecond)
select {
case err := <-donec:
t.Errorf("dial didn't wait (%v)", err)
default:
// accept first connection so client is created with dial timeout
ln, err := net.Listen("unix", "dialcancel:12345")
if err != nil {
t.Fatal(err)
}
defer ln.Close()
ep := "unix://dialcancel:12345"
cfg := Config{
Endpoints: []string{ep},
DialTimeout: 30 * time.Second}
c, err := New(cfg)
if err != nil {
t.Fatal(err)
}
// connect to ipv4 blackhole so dial blocks
c.SetEndpoints("http://254.0.0.1:12345")
// issue Get to force redial attempts
go c.Get(context.TODO(), "abc")
// wait a little bit so client close is after dial starts
time.Sleep(100 * time.Millisecond)
donec := make(chan struct{})
go func() {
defer close(donec)
c.Close()
}()
select {
case <-time.After(5 * time.Second):
t.Errorf("failed to timeout dial on time")
case err := <-donec:
if err != grpc.ErrClientConnTimeout {
t.Errorf("unexpected error %v, want %v", err, grpc.ErrClientConnTimeout)
t.Fatalf("failed to close")
case <-donec:
}
}
func TestDialTimeout(t *testing.T) {
defer testutil.AfterTest(t)
testCfgs := []Config{
{
Endpoints: []string{"http://254.0.0.1:12345"},
DialTimeout: 2 * time.Second,
},
{
Endpoints: []string{"http://254.0.0.1:12345"},
DialTimeout: time.Second,
Username: "abc",
Password: "def",
},
}
for i, cfg := range testCfgs {
donec := make(chan error)
go func() {
// without timeout, dial continues forever on ipv4 blackhole
c, err := New(cfg)
if c != nil || err == nil {
t.Errorf("#%d: new client should fail", i)
}
donec <- err
}()
time.Sleep(10 * time.Millisecond)
select {
case err := <-donec:
t.Errorf("#%d: dial didn't wait (%v)", i, err)
default:
}
select {
case <-time.After(5 * time.Second):
t.Errorf("#%d: failed to timeout dial on time", i)
case err := <-donec:
if err != grpc.ErrClientConnTimeout {
t.Errorf("#%d: unexpected error %v, want %v", i, err, grpc.ErrClientConnTimeout)
}
}
}
}

View File

@ -249,11 +249,10 @@ func (s *stmReadCommitted) commit() *v3.TxnResponse {
}
func isKeyCurrent(k string, r *v3.GetResponse) v3.Cmp {
rev := r.Header.Revision + 1
if len(r.Kvs) != 0 {
rev = r.Kvs[0].ModRevision + 1
return v3.Compare(v3.ModRevision(k), "=", r.Kvs[0].ModRevision)
}
return v3.Compare(v3.ModRevision(k), "<", rev)
return v3.Compare(v3.ModRevision(k), "=", 0)
}
func respToValue(resp *v3.GetResponse) string {

View File

@ -156,6 +156,30 @@ func TestLeaseKeepAlive(t *testing.T) {
}
}
func TestLeaseKeepAliveOneSecond(t *testing.T) {
defer testutil.AfterTest(t)
clus := integration.NewClusterV3(t, &integration.ClusterConfig{Size: 1})
defer clus.Terminate(t)
cli := clus.Client(0)
resp, err := cli.Grant(context.Background(), 1)
if err != nil {
t.Errorf("failed to create lease %v", err)
}
rc, kerr := cli.KeepAlive(context.Background(), resp.ID)
if kerr != nil {
t.Errorf("failed to keepalive lease %v", kerr)
}
for i := 0; i < 3; i++ {
if _, ok := <-rc; !ok {
t.Errorf("chan is closed, want not closed")
}
}
}
// TODO: add a client that can connect to all the members of cluster via unix sock.
// TODO: test handle more complicated failures.
func TestLeaseKeepAliveHandleFailure(t *testing.T) {

View File

@ -347,7 +347,57 @@ func putAndWatch(t *testing.T, wctx *watchctx, key, val string) {
}
}
// TestWatchResumeComapcted checks that the watcher gracefully closes in case
func TestWatchResumeInitRev(t *testing.T) {
defer testutil.AfterTest(t)
clus := integration.NewClusterV3(t, &integration.ClusterConfig{Size: 1})
defer clus.Terminate(t)
cli := clus.Client(0)
if _, err := cli.Put(context.TODO(), "b", "2"); err != nil {
t.Fatal(err)
}
if _, err := cli.Put(context.TODO(), "a", "3"); err != nil {
t.Fatal(err)
}
// if resume is broken, it'll pick up this key first instead of a=3
if _, err := cli.Put(context.TODO(), "a", "4"); err != nil {
t.Fatal(err)
}
wch := clus.Client(0).Watch(context.Background(), "a", clientv3.WithRev(1), clientv3.WithCreatedNotify())
if resp, ok := <-wch; !ok || resp.Header.Revision != 4 {
t.Fatalf("got (%v, %v), expected create notification rev=4", resp, ok)
}
// pause wch
clus.Members[0].DropConnections()
clus.Members[0].PauseConnections()
select {
case resp, ok := <-wch:
t.Skipf("wch should block, got (%+v, %v); drop not fast enough", resp, ok)
case <-time.After(100 * time.Millisecond):
}
// resume wch
clus.Members[0].UnpauseConnections()
select {
case resp, ok := <-wch:
if !ok {
t.Fatal("unexpected watch close")
}
if len(resp.Events) == 0 {
t.Fatal("expected event on watch")
}
if string(resp.Events[0].Kv.Value) != "3" {
t.Fatalf("expected value=3, got event %+v", resp.Events[0])
}
case <-time.After(5 * time.Second):
t.Fatal("watch timed out")
}
}
// TestWatchResumeCompacted checks that the watcher gracefully closes in case
// that it tries to resume to a revision that's been compacted out of the store.
// Since the watcher's server restarts with stale data, the watcher will receive
// either a compaction error or all keys by staying in sync before the compaction

View File

@ -416,7 +416,7 @@ func (l *lessor) recvKeepAlive(resp *pb.LeaseKeepAliveResponse) {
}
// send update to all channels
nextKeepAlive := time.Now().Add(1 + time.Duration(karesp.TTL/3)*time.Second)
nextKeepAlive := time.Now().Add((time.Duration(karesp.TTL) * time.Second) / 3.0)
ka.deadline = time.Now().Add(time.Duration(karesp.TTL) * time.Second)
for _, ch := range ka.chs {
select {

View File

@ -132,6 +132,8 @@ type watchGrpcStream struct {
errc chan error
// closingc gets the watcherStream of closing watchers
closingc chan *watcherStream
// wg is Done when all substream goroutines have exited
wg sync.WaitGroup
// resumec closes to signal that all substreams should begin resuming
resumec chan struct{}
@ -406,7 +408,7 @@ func (w *watchGrpcStream) run() {
for range closing {
w.closeSubstream(<-w.closingc)
}
w.wg.Wait()
w.owner.closeStream(w)
}()
@ -431,6 +433,7 @@ func (w *watchGrpcStream) run() {
}
ws.donec = make(chan struct{})
w.wg.Add(1)
go w.serveSubstream(ws, w.resumec)
// queue up for watcher creation/resume
@ -576,6 +579,7 @@ func (w *watchGrpcStream) serveSubstream(ws *watcherStream, resumec chan struct{
if !resuming {
w.closingc <- ws
}
w.wg.Done()
}()
emptyWr := &WatchResponse{}
@ -612,10 +616,24 @@ func (w *watchGrpcStream) serveSubstream(ws *watcherStream, resumec chan struct{
if ws.initReq.createdNotify {
ws.outc <- *wr
}
// once the watch channel is returned, a current revision
// watch must resume at the store revision. This is necessary
// for the following case to work as expected:
// wch := m1.Watch("a")
// m2.Put("a", "b")
// <-wch
// If the revision is only bound on the first observed event,
// if wch is disconnected before the Put is issued, then reconnects
// after it is committed, it'll miss the Put.
if ws.initReq.rev == 0 {
nextRev = wr.Header.Revision
}
}
} else {
// current progress of watch; <= store revision
nextRev = wr.Header.Revision
}
nextRev = wr.Header.Revision
if len(wr.Events) > 0 {
nextRev = wr.Events[len(wr.Events)-1].Kv.ModRevision + 1
}
@ -674,6 +692,7 @@ func (w *watchGrpcStream) newWatchClient() (pb.Watch_WatchClient, error) {
continue
}
ws.donec = make(chan struct{})
w.wg.Add(1)
go w.serveSubstream(ws, w.resumec)
}
@ -694,6 +713,10 @@ func (w *watchGrpcStream) waitCancelSubstreams(stopc <-chan struct{}) <-chan str
go func(ws *watcherStream) {
defer wg.Done()
if ws.closing {
if ws.initReq.ctx.Err() != nil && ws.outc != nil {
close(ws.outc)
ws.outc = nil
}
return
}
select {

View File

@ -5,3 +5,6 @@ const maxMapSize = 0x7FFFFFFF // 2GB
// maxAllocSize is the size used when creating array pointers.
const maxAllocSize = 0xFFFFFFF
// Are unaligned load/stores broken on this arch?
var brokenUnaligned = false

View File

@ -5,3 +5,6 @@ const maxMapSize = 0xFFFFFFFFFFFF // 256TB
// maxAllocSize is the size used when creating array pointers.
const maxAllocSize = 0x7FFFFFFF
// Are unaligned load/stores broken on this arch?
var brokenUnaligned = false

28
cmd/vendor/github.com/coreos/bbolt/bolt_arm.go generated vendored Normal file
View File

@ -0,0 +1,28 @@
package bolt
import "unsafe"
// maxMapSize represents the largest mmap size supported by Bolt.
const maxMapSize = 0x7FFFFFFF // 2GB
// maxAllocSize is the size used when creating array pointers.
const maxAllocSize = 0xFFFFFFF
// Are unaligned load/stores broken on this arch?
var brokenUnaligned bool
func init() {
// Simple check to see whether this arch handles unaligned load/stores
// correctly.
// ARM9 and older devices require load/stores to be from/to aligned
// addresses. If not, the lower 2 bits are cleared and that address is
// read in a jumbled up order.
// See http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka15414.html
raw := [6]byte{0xfe, 0xef, 0x11, 0x22, 0x22, 0x11}
val := *(*uint32)(unsafe.Pointer(uintptr(unsafe.Pointer(&raw)) + 2))
brokenUnaligned = val != 0x11222211
}

View File

@ -7,3 +7,6 @@ const maxMapSize = 0xFFFFFFFFFFFF // 256TB
// maxAllocSize is the size used when creating array pointers.
const maxAllocSize = 0x7FFFFFFF
// Are unaligned load/stores broken on this arch?
var brokenUnaligned = false

12
cmd/vendor/github.com/coreos/bbolt/bolt_mips64x.go generated vendored Normal file
View File

@ -0,0 +1,12 @@
// +build mips64 mips64le
package bolt
// maxMapSize represents the largest mmap size supported by Bolt.
const maxMapSize = 0x8000000000 // 512GB
// maxAllocSize is the size used when creating array pointers.
const maxAllocSize = 0x7FFFFFFF
// Are unaligned load/stores broken on this arch?
var brokenUnaligned = false

View File

@ -1,7 +1,12 @@
// +build mips mipsle
package bolt
// maxMapSize represents the largest mmap size supported by Bolt.
const maxMapSize = 0x7FFFFFFF // 2GB
const maxMapSize = 0x40000000 // 1GB
// maxAllocSize is the size used when creating array pointers.
const maxAllocSize = 0xFFFFFFF
// Are unaligned load/stores broken on this arch?
var brokenUnaligned = false

View File

@ -7,3 +7,6 @@ const maxMapSize = 0xFFFFFFFFFFFF // 256TB
// maxAllocSize is the size used when creating array pointers.
const maxAllocSize = 0x7FFFFFFF
// Are unaligned load/stores broken on this arch?
var brokenUnaligned = false

View File

@ -7,3 +7,6 @@ const maxMapSize = 0xFFFFFFFFFFFF // 256TB
// maxAllocSize is the size used when creating array pointers.
const maxAllocSize = 0x7FFFFFFF
// Are unaligned load/stores broken on this arch?
var brokenUnaligned = false

View File

@ -7,3 +7,6 @@ const maxMapSize = 0xFFFFFFFFFFFF // 256TB
// maxAllocSize is the size used when creating array pointers.
const maxAllocSize = 0x7FFFFFFF
// Are unaligned load/stores broken on this arch?
var brokenUnaligned = false

View File

@ -13,29 +13,32 @@ import (
// flock acquires an advisory lock on a file descriptor.
func flock(db *DB, mode os.FileMode, exclusive bool, timeout time.Duration) error {
var t time.Time
if timeout != 0 {
t = time.Now()
}
fd := db.file.Fd()
flag := syscall.LOCK_NB
if exclusive {
flag |= syscall.LOCK_EX
} else {
flag |= syscall.LOCK_SH
}
for {
// If we're beyond our timeout then return an error.
// This can only occur after we've attempted a flock once.
if t.IsZero() {
t = time.Now()
} else if timeout > 0 && time.Since(t) > timeout {
return ErrTimeout
}
flag := syscall.LOCK_SH
if exclusive {
flag = syscall.LOCK_EX
}
// Otherwise attempt to obtain an exclusive lock.
err := syscall.Flock(int(db.file.Fd()), flag|syscall.LOCK_NB)
// Attempt to obtain an exclusive lock.
err := syscall.Flock(int(fd), flag)
if err == nil {
return nil
} else if err != syscall.EWOULDBLOCK {
return err
}
// If we timed out then return an error.
if timeout != 0 && time.Since(t) > timeout-flockRetryTimeout {
return ErrTimeout
}
// Wait for a bit and try again.
time.Sleep(50 * time.Millisecond)
time.Sleep(flockRetryTimeout)
}
}

View File

@ -13,34 +13,33 @@ import (
// flock acquires an advisory lock on a file descriptor.
func flock(db *DB, mode os.FileMode, exclusive bool, timeout time.Duration) error {
var t time.Time
if timeout != 0 {
t = time.Now()
}
fd := db.file.Fd()
var lockType int16
if exclusive {
lockType = syscall.F_WRLCK
} else {
lockType = syscall.F_RDLCK
}
for {
// If we're beyond our timeout then return an error.
// This can only occur after we've attempted a flock once.
if t.IsZero() {
t = time.Now()
} else if timeout > 0 && time.Since(t) > timeout {
return ErrTimeout
}
var lock syscall.Flock_t
lock.Start = 0
lock.Len = 0
lock.Pid = 0
lock.Whence = 0
lock.Pid = 0
if exclusive {
lock.Type = syscall.F_WRLCK
} else {
lock.Type = syscall.F_RDLCK
}
err := syscall.FcntlFlock(db.file.Fd(), syscall.F_SETLK, &lock)
// Attempt to obtain an exclusive lock.
lock := syscall.Flock_t{Type: lockType}
err := syscall.FcntlFlock(fd, syscall.F_SETLK, &lock)
if err == nil {
return nil
} else if err != syscall.EAGAIN {
return err
}
// If we timed out then return an error.
if timeout != 0 && time.Since(t) > timeout-flockRetryTimeout {
return ErrTimeout
}
// Wait for a bit and try again.
time.Sleep(50 * time.Millisecond)
time.Sleep(flockRetryTimeout)
}
}

View File

@ -59,29 +59,30 @@ func flock(db *DB, mode os.FileMode, exclusive bool, timeout time.Duration) erro
db.lockfile = f
var t time.Time
if timeout != 0 {
t = time.Now()
}
fd := f.Fd()
var flag uint32 = flagLockFailImmediately
if exclusive {
flag |= flagLockExclusive
}
for {
// If we're beyond our timeout then return an error.
// This can only occur after we've attempted a flock once.
if t.IsZero() {
t = time.Now()
} else if timeout > 0 && time.Since(t) > timeout {
return ErrTimeout
}
var flag uint32 = flagLockFailImmediately
if exclusive {
flag |= flagLockExclusive
}
err := lockFileEx(syscall.Handle(db.lockfile.Fd()), flag, 0, 1, 0, &syscall.Overlapped{})
// Attempt to obtain an exclusive lock.
err := lockFileEx(syscall.Handle(fd), flag, 0, 1, 0, &syscall.Overlapped{})
if err == nil {
return nil
} else if err != errLockViolation {
return err
}
// If we timed oumercit then return an error.
if timeout != 0 && time.Since(t) > timeout-flockRetryTimeout {
return ErrTimeout
}
// Wait for a bit and try again.
time.Sleep(50 * time.Millisecond)
time.Sleep(flockRetryTimeout)
}
}
@ -89,7 +90,7 @@ func flock(db *DB, mode os.FileMode, exclusive bool, timeout time.Duration) erro
func funlock(db *DB) error {
err := unlockFileEx(syscall.Handle(db.lockfile.Fd()), 0, 1, 0, &syscall.Overlapped{})
db.lockfile.Close()
os.Remove(db.path+lockExt)
os.Remove(db.path + lockExt)
return err
}

View File

@ -14,13 +14,6 @@ const (
MaxValueSize = (1 << 31) - 2
)
const (
maxUint = ^uint(0)
minUint = 0
maxInt = int(^uint(0) >> 1)
minInt = -maxInt - 1
)
const bucketHeaderSize = int(unsafe.Sizeof(bucket{}))
const (
@ -130,9 +123,17 @@ func (b *Bucket) Bucket(name []byte) *Bucket {
func (b *Bucket) openBucket(value []byte) *Bucket {
var child = newBucket(b.tx)
// If unaligned load/stores are broken on this arch and value is
// unaligned simply clone to an aligned byte array.
unaligned := brokenUnaligned && uintptr(unsafe.Pointer(&value[0]))&3 != 0
if unaligned {
value = cloneBytes(value)
}
// If this is a writable transaction then we need to copy the bucket entry.
// Read-only transactions can point directly at the mmap entry.
if b.tx.writable {
if b.tx.writable && !unaligned {
child.bucket = &bucket{}
*child.bucket = *(*bucket)(unsafe.Pointer(&value[0]))
} else {
@ -167,9 +168,8 @@ func (b *Bucket) CreateBucket(key []byte) (*Bucket, error) {
if bytes.Equal(key, k) {
if (flags & bucketLeafFlag) != 0 {
return nil, ErrBucketExists
} else {
return nil, ErrIncompatibleValue
}
return nil, ErrIncompatibleValue
}
// Create empty, inline bucket.
@ -316,7 +316,12 @@ func (b *Bucket) Delete(key []byte) error {
// Move cursor to correct position.
c := b.Cursor()
_, _, flags := c.seek(key)
k, _, flags := c.seek(key)
// Return nil if the key doesn't exist.
if !bytes.Equal(key, k) {
return nil
}
// Return an error if there is already existing bucket value.
if (flags & bucketLeafFlag) != 0 {
@ -329,6 +334,28 @@ func (b *Bucket) Delete(key []byte) error {
return nil
}
// Sequence returns the current integer for the bucket without incrementing it.
func (b *Bucket) Sequence() uint64 { return b.bucket.sequence }
// SetSequence updates the sequence number for the bucket.
func (b *Bucket) SetSequence(v uint64) error {
if b.tx.db == nil {
return ErrTxClosed
} else if !b.Writable() {
return ErrTxNotWritable
}
// Materialize the root node if it hasn't been already so that the
// bucket will be saved during commit.
if b.rootNode == nil {
_ = b.node(b.root, nil)
}
// Increment and return the sequence.
b.bucket.sequence = v
return nil
}
// NextSequence returns an autoincrementing integer for the bucket.
func (b *Bucket) NextSequence() (uint64, error) {
if b.tx.db == nil {

View File

@ -7,8 +7,7 @@ import (
"log"
"os"
"runtime"
"runtime/debug"
"strings"
"sort"
"sync"
"time"
"unsafe"
@ -23,6 +22,8 @@ const version = 2
// Represents a marker value to indicate that a file is a Bolt DB.
const magic uint32 = 0xED0CDAED
const pgidNoFreelist pgid = 0xffffffffffffffff
// IgnoreNoSync specifies whether the NoSync field of a DB is ignored when
// syncing changes to a file. This is required as some operating systems,
// such as OpenBSD, do not have a unified buffer cache (UBC) and writes
@ -39,6 +40,9 @@ const (
// default page size for db is set to the OS page size.
var defaultPageSize = os.Getpagesize()
// The time elapsed between consecutive file locking attempts.
const flockRetryTimeout = 50 * time.Millisecond
// DB represents a collection of buckets persisted to a file on disk.
// All data access is performed through transactions which can be obtained through the DB.
// All the functions on DB will return a ErrDatabaseNotOpen if accessed before Open() is called.
@ -61,6 +65,11 @@ type DB struct {
// THIS IS UNSAFE. PLEASE USE WITH CAUTION.
NoSync bool
// When true, skips syncing freelist to disk. This improves the database
// write performance under normal operation, but requires a full database
// re-sync during recovery.
NoFreelistSync bool
// When true, skips the truncate call when growing the database.
// Setting this to true is only safe on non-ext3/ext4 systems.
// Skipping truncation avoids preallocation of hard drive space and
@ -107,9 +116,11 @@ type DB struct {
opened bool
rwtx *Tx
txs []*Tx
freelist *freelist
stats Stats
freelist *freelist
freelistLoad sync.Once
pagePool sync.Pool
batchMu sync.Mutex
@ -148,14 +159,17 @@ func (db *DB) String() string {
// If the file does not exist then it will be created automatically.
// Passing in nil options will cause Bolt to open the database with the default options.
func Open(path string, mode os.FileMode, options *Options) (*DB, error) {
var db = &DB{opened: true}
db := &DB{
opened: true,
}
// Set default options if no options are provided.
if options == nil {
options = DefaultOptions
}
db.NoSync = options.NoSync
db.NoGrowSync = options.NoGrowSync
db.MmapFlags = options.MmapFlags
db.NoFreelistSync = options.NoFreelistSync
// Set default values for later DB operations.
db.MaxBatchSize = DefaultMaxBatchSize
@ -184,6 +198,7 @@ func Open(path string, mode os.FileMode, options *Options) (*DB, error) {
// The database file is locked using the shared lock (more than one process may
// hold a lock at the same time) otherwise (options.ReadOnly is set).
if err := flock(db, mode, !db.readOnly, options.Timeout); err != nil {
db.lockfile = nil // make 'unused' happy. TODO: rework locks
_ = db.close()
return nil, err
}
@ -191,6 +206,11 @@ func Open(path string, mode os.FileMode, options *Options) (*DB, error) {
// Default values for test hooks
db.ops.writeAt = db.file.WriteAt
if db.pageSize = options.PageSize; db.pageSize == 0 {
// Set the default page size to the OS page size.
db.pageSize = defaultPageSize
}
// Initialize the database if it doesn't exist.
if info, err := db.file.Stat(); err != nil {
return nil, err
@ -202,20 +222,21 @@ func Open(path string, mode os.FileMode, options *Options) (*DB, error) {
} else {
// Read the first meta page to determine the page size.
var buf [0x1000]byte
if _, err := db.file.ReadAt(buf[:], 0); err == nil {
m := db.pageInBuffer(buf[:], 0).meta()
if err := m.validate(); err != nil {
// If we can't read the page size, we can assume it's the same
// as the OS -- since that's how the page size was chosen in the
// first place.
//
// If the first page is invalid and this OS uses a different
// page size than what the database was created with then we
// are out of luck and cannot access the database.
db.pageSize = os.Getpagesize()
} else {
// If we can't read the page size, but can read a page, assume
// it's the same as the OS or one given -- since that's how the
// page size was chosen in the first place.
//
// If the first page is invalid and this OS uses a different
// page size than what the database was created with then we
// are out of luck and cannot access the database.
//
// TODO: scan for next page
if bw, err := db.file.ReadAt(buf[:], 0); err == nil && bw == len(buf) {
if m := db.pageInBuffer(buf[:], 0).meta(); m.validate() == nil {
db.pageSize = int(m.pageSize)
}
} else {
return nil, ErrInvalid
}
}
@ -232,14 +253,50 @@ func Open(path string, mode os.FileMode, options *Options) (*DB, error) {
return nil, err
}
// Read in the freelist.
db.freelist = newFreelist()
db.freelist.read(db.page(db.meta().freelist))
if db.readOnly {
return db, nil
}
db.loadFreelist()
// Flush freelist when transitioning from no sync to sync so
// NoFreelistSync unaware boltdb can open the db later.
if !db.NoFreelistSync && !db.hasSyncedFreelist() {
tx, err := db.Begin(true)
if tx != nil {
err = tx.Commit()
}
if err != nil {
_ = db.close()
return nil, err
}
}
// Mark the database as opened and return.
return db, nil
}
// loadFreelist reads the freelist if it is synced, or reconstructs it
// by scanning the DB if it is not synced. It assumes there are no
// concurrent accesses being made to the freelist.
func (db *DB) loadFreelist() {
db.freelistLoad.Do(func() {
db.freelist = newFreelist()
if !db.hasSyncedFreelist() {
// Reconstruct free list by scanning the DB.
db.freelist.readIDs(db.freepages())
} else {
// Read free list from freelist page.
db.freelist.read(db.page(db.meta().freelist))
}
db.stats.FreePageN = len(db.freelist.ids)
})
}
func (db *DB) hasSyncedFreelist() bool {
return db.meta().freelist != pgidNoFreelist
}
// mmap opens the underlying memory-mapped file and initializes the meta references.
// minsz is the minimum size that the new mmap can be.
func (db *DB) mmap(minsz int) error {
@ -341,9 +398,6 @@ func (db *DB) mmapSize(size int) (int, error) {
// init creates a new database file and initializes its meta pages.
func (db *DB) init() error {
// Set the page size to the OS page size.
db.pageSize = os.Getpagesize()
// Create two meta pages on a buffer.
buf := make([]byte, db.pageSize*4)
for i := 0; i < 2; i++ {
@ -526,21 +580,36 @@ func (db *DB) beginRWTx() (*Tx, error) {
t := &Tx{writable: true}
t.init(db)
db.rwtx = t
db.freePages()
return t, nil
}
// Free any pages associated with closed read-only transactions.
var minid txid = 0xFFFFFFFFFFFFFFFF
for _, t := range db.txs {
if t.meta.txid < minid {
minid = t.meta.txid
}
// freePages releases any pages associated with closed read-only transactions.
func (db *DB) freePages() {
// Free all pending pages prior to earliest open transaction.
sort.Sort(txsById(db.txs))
minid := txid(0xFFFFFFFFFFFFFFFF)
if len(db.txs) > 0 {
minid = db.txs[0].meta.txid
}
if minid > 0 {
db.freelist.release(minid - 1)
}
return t, nil
// Release unused txid extents.
for _, t := range db.txs {
db.freelist.releaseRange(minid, t.meta.txid-1)
minid = t.meta.txid + 1
}
db.freelist.releaseRange(minid, txid(0xFFFFFFFFFFFFFFFF))
// Any page both allocated and freed in an extent is safe to release.
}
type txsById []*Tx
func (t txsById) Len() int { return len(t) }
func (t txsById) Swap(i, j int) { t[i], t[j] = t[j], t[i] }
func (t txsById) Less(i, j int) bool { return t[i].meta.txid < t[j].meta.txid }
// removeTx removes a transaction from the database.
func (db *DB) removeTx(tx *Tx) {
// Release the read lock on the mmap.
@ -552,7 +621,10 @@ func (db *DB) removeTx(tx *Tx) {
// Remove the transaction.
for i, t := range db.txs {
if t == tx {
db.txs = append(db.txs[:i], db.txs[i+1:]...)
last := len(db.txs) - 1
db.txs[i] = db.txs[last]
db.txs[last] = nil
db.txs = db.txs[:last]
break
}
}
@ -630,11 +702,7 @@ func (db *DB) View(fn func(*Tx) error) error {
return err
}
if err := t.Rollback(); err != nil {
return err
}
return nil
return t.Rollback()
}
// Batch calls fn as part of a batch. It behaves similar to Update,
@ -823,7 +891,7 @@ func (db *DB) meta() *meta {
}
// allocate returns a contiguous block of memory starting at a given page.
func (db *DB) allocate(count int) (*page, error) {
func (db *DB) allocate(txid txid, count int) (*page, error) {
// Allocate a temporary buffer for the page.
var buf []byte
if count == 1 {
@ -835,7 +903,7 @@ func (db *DB) allocate(count int) (*page, error) {
p.overflow = uint32(count - 1)
// Use pages from the freelist if they are available.
if p.id = db.freelist.allocate(count); p.id != 0 {
if p.id = db.freelist.allocate(txid, count); p.id != 0 {
return p, nil
}
@ -890,6 +958,38 @@ func (db *DB) IsReadOnly() bool {
return db.readOnly
}
func (db *DB) freepages() []pgid {
tx, err := db.beginTx()
defer func() {
err = tx.Rollback()
if err != nil {
panic("freepages: failed to rollback tx")
}
}()
if err != nil {
panic("freepages: failed to open read only tx")
}
reachable := make(map[pgid]*page)
nofreed := make(map[pgid]bool)
ech := make(chan error)
go func() {
for e := range ech {
panic(fmt.Sprintf("freepages: failed to get all reachable pages (%v)", e))
}
}()
tx.checkBucket(&tx.root, reachable, nofreed, ech)
close(ech)
var fids []pgid
for i := pgid(2); i < db.meta().pgid; i++ {
if _, ok := reachable[i]; !ok {
fids = append(fids, i)
}
}
return fids
}
// Options represents the options that can be set when opening a database.
type Options struct {
// Timeout is the amount of time to wait to obtain a file lock.
@ -900,6 +1000,10 @@ type Options struct {
// Sets the DB.NoGrowSync flag before memory mapping the file.
NoGrowSync bool
// Do not sync freelist to disk. This improves the database write performance
// under normal operation, but requires a full database re-sync during recovery.
NoFreelistSync bool
// Open database in read-only mode. Uses flock(..., LOCK_SH |LOCK_NB) to
// grab a shared lock (UNIX).
ReadOnly bool
@ -916,6 +1020,14 @@ type Options struct {
// If initialMmapSize is smaller than the previous database size,
// it takes no effect.
InitialMmapSize int
// PageSize overrides the default OS page size.
PageSize int
// NoSync sets the initial value of DB.NoSync. Normally this can just be
// set directly on the DB itself when returned from Open(), but this option
// is useful in APIs which expose Options but not the underlying DB.
NoSync bool
}
// DefaultOptions represent the options used if nil options are passed into Open().
@ -952,15 +1064,11 @@ func (s *Stats) Sub(other *Stats) Stats {
diff.PendingPageN = s.PendingPageN
diff.FreeAlloc = s.FreeAlloc
diff.FreelistInuse = s.FreelistInuse
diff.TxN = other.TxN - s.TxN
diff.TxN = s.TxN - other.TxN
diff.TxStats = s.TxStats.Sub(&other.TxStats)
return diff
}
func (s *Stats) add(other *Stats) {
s.TxStats.add(&other.TxStats)
}
type Info struct {
Data uintptr
PageSize int
@ -999,7 +1107,8 @@ func (m *meta) copy(dest *meta) {
func (m *meta) write(p *page) {
if m.root.root >= m.pgid {
panic(fmt.Sprintf("root bucket pgid (%d) above high water mark (%d)", m.root.root, m.pgid))
} else if m.freelist >= m.pgid {
} else if m.freelist >= m.pgid && m.freelist != pgidNoFreelist {
// TODO: reject pgidNoFreeList if !NoFreelistSync
panic(fmt.Sprintf("freelist pgid (%d) above high water mark (%d)", m.freelist, m.pgid))
}
@ -1026,11 +1135,3 @@ func _assert(condition bool, msg string, v ...interface{}) {
panic(fmt.Sprintf("assertion failed: "+msg, v...))
}
}
func warn(v ...interface{}) { fmt.Fprintln(os.Stderr, v...) }
func warnf(msg string, v ...interface{}) { fmt.Fprintf(os.Stderr, msg+"\n", v...) }
func printstack() {
stack := strings.Join(strings.Split(string(debug.Stack()), "\n")[2:], "\n")
fmt.Fprintln(os.Stderr, stack)
}

View File

@ -6,25 +6,40 @@ import (
"unsafe"
)
// txPending holds a list of pgids and corresponding allocation txns
// that are pending to be freed.
type txPending struct {
ids []pgid
alloctx []txid // txids allocating the ids
lastReleaseBegin txid // beginning txid of last matching releaseRange
}
// freelist represents a list of all pages that are available for allocation.
// It also tracks pages that have been freed but are still in use by open transactions.
type freelist struct {
ids []pgid // all free and available free page ids.
pending map[txid][]pgid // mapping of soon-to-be free page ids by tx.
cache map[pgid]bool // fast lookup of all free and pending page ids.
ids []pgid // all free and available free page ids.
allocs map[pgid]txid // mapping of txid that allocated a pgid.
pending map[txid]*txPending // mapping of soon-to-be free page ids by tx.
cache map[pgid]bool // fast lookup of all free and pending page ids.
}
// newFreelist returns an empty, initialized freelist.
func newFreelist() *freelist {
return &freelist{
pending: make(map[txid][]pgid),
allocs: make(map[pgid]txid),
pending: make(map[txid]*txPending),
cache: make(map[pgid]bool),
}
}
// size returns the size of the page after serialization.
func (f *freelist) size() int {
return pageHeaderSize + (int(unsafe.Sizeof(pgid(0))) * f.count())
n := f.count()
if n >= 0xFFFF {
// The first element will be used to store the count. See freelist.write.
n++
}
return pageHeaderSize + (int(unsafe.Sizeof(pgid(0))) * n)
}
// count returns count of pages on the freelist
@ -40,27 +55,26 @@ func (f *freelist) free_count() int {
// pending_count returns count of pending pages
func (f *freelist) pending_count() int {
var count int
for _, list := range f.pending {
count += len(list)
for _, txp := range f.pending {
count += len(txp.ids)
}
return count
}
// all returns a list of all free ids and all pending ids in one sorted list.
func (f *freelist) all() []pgid {
m := make(pgids, 0)
for _, list := range f.pending {
m = append(m, list...)
// copyall copies into dst a list of all free ids and all pending ids in one sorted list.
// f.count returns the minimum length required for dst.
func (f *freelist) copyall(dst []pgid) {
m := make(pgids, 0, f.pending_count())
for _, txp := range f.pending {
m = append(m, txp.ids...)
}
sort.Sort(m)
return pgids(f.ids).merge(m)
mergepgids(dst, f.ids, m)
}
// allocate returns the starting page id of a contiguous list of pages of a given size.
// If a contiguous block cannot be found then 0 is returned.
func (f *freelist) allocate(n int) pgid {
func (f *freelist) allocate(txid txid, n int) pgid {
if len(f.ids) == 0 {
return 0
}
@ -93,7 +107,7 @@ func (f *freelist) allocate(n int) pgid {
for i := pgid(0); i < pgid(n); i++ {
delete(f.cache, initial+i)
}
f.allocs[initial] = txid
return initial
}
@ -110,28 +124,73 @@ func (f *freelist) free(txid txid, p *page) {
}
// Free page and all its overflow pages.
var ids = f.pending[txid]
txp := f.pending[txid]
if txp == nil {
txp = &txPending{}
f.pending[txid] = txp
}
allocTxid, ok := f.allocs[p.id]
if ok {
delete(f.allocs, p.id)
} else if (p.flags & freelistPageFlag) != 0 {
// Freelist is always allocated by prior tx.
allocTxid = txid - 1
}
for id := p.id; id <= p.id+pgid(p.overflow); id++ {
// Verify that page is not already free.
if f.cache[id] {
panic(fmt.Sprintf("page %d already freed", id))
}
// Add to the freelist and cache.
ids = append(ids, id)
txp.ids = append(txp.ids, id)
txp.alloctx = append(txp.alloctx, allocTxid)
f.cache[id] = true
}
f.pending[txid] = ids
}
// release moves all page ids for a transaction id (or older) to the freelist.
func (f *freelist) release(txid txid) {
m := make(pgids, 0)
for tid, ids := range f.pending {
for tid, txp := range f.pending {
if tid <= txid {
// Move transaction's pending pages to the available freelist.
// Don't remove from the cache since the page is still free.
m = append(m, ids...)
m = append(m, txp.ids...)
delete(f.pending, tid)
}
}
sort.Sort(m)
f.ids = pgids(f.ids).merge(m)
}
// releaseRange moves pending pages allocated within an extent [begin,end] to the free list.
func (f *freelist) releaseRange(begin, end txid) {
if begin > end {
return
}
var m pgids
for tid, txp := range f.pending {
if tid < begin || tid > end {
continue
}
// Don't recompute freed pages if ranges haven't updated.
if txp.lastReleaseBegin == begin {
continue
}
for i := 0; i < len(txp.ids); i++ {
if atx := txp.alloctx[i]; atx < begin || atx > end {
continue
}
m = append(m, txp.ids[i])
txp.ids[i] = txp.ids[len(txp.ids)-1]
txp.ids = txp.ids[:len(txp.ids)-1]
txp.alloctx[i] = txp.alloctx[len(txp.alloctx)-1]
txp.alloctx = txp.alloctx[:len(txp.alloctx)-1]
i--
}
txp.lastReleaseBegin = begin
if len(txp.ids) == 0 {
delete(f.pending, tid)
}
}
@ -142,12 +201,29 @@ func (f *freelist) release(txid txid) {
// rollback removes the pages from a given pending tx.
func (f *freelist) rollback(txid txid) {
// Remove page ids from cache.
for _, id := range f.pending[txid] {
delete(f.cache, id)
txp := f.pending[txid]
if txp == nil {
return
}
// Remove pages from pending list.
var m pgids
for i, pgid := range txp.ids {
delete(f.cache, pgid)
tx := txp.alloctx[i]
if tx == 0 {
continue
}
if tx != txid {
// Pending free aborted; restore page back to alloc list.
f.allocs[pgid] = tx
} else {
// Freed page was allocated by this txn; OK to throw away.
m = append(m, pgid)
}
}
// Remove pages from pending list and mark as free if allocated by txid.
delete(f.pending, txid)
sort.Sort(m)
f.ids = pgids(f.ids).merge(m)
}
// freed returns whether a given page is in the free list.
@ -157,6 +233,9 @@ func (f *freelist) freed(pgid pgid) bool {
// read initializes the freelist from a freelist page.
func (f *freelist) read(p *page) {
if (p.flags & freelistPageFlag) == 0 {
panic(fmt.Sprintf("invalid freelist page: %d, page type is %s", p.id, p.typ()))
}
// If the page.count is at the max uint16 value (64k) then it's considered
// an overflow and the size of the freelist is stored as the first element.
idx, count := 0, int(p.count)
@ -169,7 +248,7 @@ func (f *freelist) read(p *page) {
if count == 0 {
f.ids = nil
} else {
ids := ((*[maxAllocSize]pgid)(unsafe.Pointer(&p.ptr)))[idx:count]
ids := ((*[maxAllocSize]pgid)(unsafe.Pointer(&p.ptr)))[idx : idx+count]
f.ids = make([]pgid, len(ids))
copy(f.ids, ids)
@ -181,27 +260,33 @@ func (f *freelist) read(p *page) {
f.reindex()
}
// read initializes the freelist from a given list of ids.
func (f *freelist) readIDs(ids []pgid) {
f.ids = ids
f.reindex()
}
// write writes the page ids onto a freelist page. All free and pending ids are
// saved to disk since in the event of a program crash, all pending ids will
// become free.
func (f *freelist) write(p *page) error {
// Combine the old free pgids and pgids waiting on an open transaction.
ids := f.all()
// Update the header flag.
p.flags |= freelistPageFlag
// The page.count can only hold up to 64k elements so if we overflow that
// number then we handle it by putting the size in the first element.
if len(ids) == 0 {
p.count = uint16(len(ids))
} else if len(ids) < 0xFFFF {
p.count = uint16(len(ids))
copy(((*[maxAllocSize]pgid)(unsafe.Pointer(&p.ptr)))[:], ids)
lenids := f.count()
if lenids == 0 {
p.count = uint16(lenids)
} else if lenids < 0xFFFF {
p.count = uint16(lenids)
f.copyall(((*[maxAllocSize]pgid)(unsafe.Pointer(&p.ptr)))[:])
} else {
p.count = 0xFFFF
((*[maxAllocSize]pgid)(unsafe.Pointer(&p.ptr)))[0] = pgid(len(ids))
copy(((*[maxAllocSize]pgid)(unsafe.Pointer(&p.ptr)))[1:], ids)
((*[maxAllocSize]pgid)(unsafe.Pointer(&p.ptr)))[0] = pgid(lenids)
f.copyall(((*[maxAllocSize]pgid)(unsafe.Pointer(&p.ptr)))[1:])
}
return nil
@ -213,8 +298,8 @@ func (f *freelist) reload(p *page) {
// Build a cache of only pending pages.
pcache := make(map[pgid]bool)
for _, pendingIDs := range f.pending {
for _, pendingID := range pendingIDs {
for _, txp := range f.pending {
for _, pendingID := range txp.ids {
pcache[pendingID] = true
}
}
@ -236,12 +321,12 @@ func (f *freelist) reload(p *page) {
// reindex rebuilds the free cache based on available and pending free lists.
func (f *freelist) reindex() {
f.cache = make(map[pgid]bool)
f.cache = make(map[pgid]bool, len(f.ids))
for _, id := range f.ids {
f.cache[id] = true
}
for _, pendingIDs := range f.pending {
for _, pendingID := range pendingIDs {
for _, txp := range f.pending {
for _, pendingID := range txp.ids {
f.cache[pendingID] = true
}
}

View File

@ -365,7 +365,7 @@ func (n *node) spill() error {
}
// Allocate contiguous space for the node.
p, err := tx.allocate((node.size() / tx.db.pageSize) + 1)
p, err := tx.allocate((node.size() + tx.db.pageSize - 1) / tx.db.pageSize)
if err != nil {
return err
}

View File

@ -145,12 +145,33 @@ func (a pgids) merge(b pgids) pgids {
// Return the opposite slice if one is nil.
if len(a) == 0 {
return b
} else if len(b) == 0 {
}
if len(b) == 0 {
return a
}
merged := make(pgids, len(a)+len(b))
mergepgids(merged, a, b)
return merged
}
// Create a list to hold all elements from both lists.
merged := make(pgids, 0, len(a)+len(b))
// mergepgids copies the sorted union of a and b into dst.
// If dst is too small, it panics.
func mergepgids(dst, a, b pgids) {
if len(dst) < len(a)+len(b) {
panic(fmt.Errorf("mergepgids bad len %d < %d + %d", len(dst), len(a), len(b)))
}
// Copy in the opposite slice if one is nil.
if len(a) == 0 {
copy(dst, b)
return
}
if len(b) == 0 {
copy(dst, a)
return
}
// Merged will hold all elements from both lists.
merged := dst[:0]
// Assign lead to the slice with a lower starting value, follow to the higher value.
lead, follow := a, b
@ -172,7 +193,5 @@ func (a pgids) merge(b pgids) pgids {
}
// Append what's left in follow.
merged = append(merged, follow...)
return merged
_ = append(merged, follow...)
}

View File

@ -126,10 +126,7 @@ func (tx *Tx) DeleteBucket(name []byte) error {
// the error is returned to the caller.
func (tx *Tx) ForEach(fn func(name []byte, b *Bucket) error) error {
return tx.root.ForEach(func(k, v []byte) error {
if err := fn(k, tx.root.Bucket(k)); err != nil {
return err
}
return nil
return fn(k, tx.root.Bucket(k))
})
}
@ -169,28 +166,18 @@ func (tx *Tx) Commit() error {
// Free the old root bucket.
tx.meta.root.root = tx.root.root
opgid := tx.meta.pgid
// Free the freelist and allocate new pages for it. This will overestimate
// the size of the freelist but not underestimate the size (which would be bad).
tx.db.freelist.free(tx.meta.txid, tx.db.page(tx.meta.freelist))
p, err := tx.allocate((tx.db.freelist.size() / tx.db.pageSize) + 1)
if err != nil {
tx.rollback()
return err
// Free the old freelist because commit writes out a fresh freelist.
if tx.meta.freelist != pgidNoFreelist {
tx.db.freelist.free(tx.meta.txid, tx.db.page(tx.meta.freelist))
}
if err := tx.db.freelist.write(p); err != nil {
tx.rollback()
return err
}
tx.meta.freelist = p.id
// If the high water mark has moved up then attempt to grow the database.
if tx.meta.pgid > opgid {
if err := tx.db.grow(int(tx.meta.pgid+1) * tx.db.pageSize); err != nil {
tx.rollback()
if !tx.db.NoFreelistSync {
err := tx.commitFreelist()
if err != nil {
return err
}
} else {
tx.meta.freelist = pgidNoFreelist
}
// Write dirty pages to disk.
@ -235,6 +222,31 @@ func (tx *Tx) Commit() error {
return nil
}
func (tx *Tx) commitFreelist() error {
// Allocate new pages for the new free list. This will overestimate
// the size of the freelist but not underestimate the size (which would be bad).
opgid := tx.meta.pgid
p, err := tx.allocate((tx.db.freelist.size() / tx.db.pageSize) + 1)
if err != nil {
tx.rollback()
return err
}
if err := tx.db.freelist.write(p); err != nil {
tx.rollback()
return err
}
tx.meta.freelist = p.id
// If the high water mark has moved up then attempt to grow the database.
if tx.meta.pgid > opgid {
if err := tx.db.grow(int(tx.meta.pgid+1) * tx.db.pageSize); err != nil {
tx.rollback()
return err
}
}
return nil
}
// Rollback closes the transaction and ignores all previous updates. Read-only
// transactions must be rolled back and not committed.
func (tx *Tx) Rollback() error {
@ -305,7 +317,11 @@ func (tx *Tx) WriteTo(w io.Writer) (n int64, err error) {
if err != nil {
return 0, err
}
defer func() { _ = f.Close() }()
defer func() {
if cerr := f.Close(); err == nil {
err = cerr
}
}()
// Generate a meta page. We use the same page data for both meta pages.
buf := make([]byte, tx.db.pageSize)
@ -333,7 +349,7 @@ func (tx *Tx) WriteTo(w io.Writer) (n int64, err error) {
}
// Move past the meta pages in the file.
if _, err := f.Seek(int64(tx.db.pageSize*2), os.SEEK_SET); err != nil {
if _, err := f.Seek(int64(tx.db.pageSize*2), io.SeekStart); err != nil {
return n, fmt.Errorf("seek: %s", err)
}
@ -344,7 +360,7 @@ func (tx *Tx) WriteTo(w io.Writer) (n int64, err error) {
return n, err
}
return n, f.Close()
return n, nil
}
// CopyFile copies the entire database to file at the given path.
@ -379,9 +395,14 @@ func (tx *Tx) Check() <-chan error {
}
func (tx *Tx) check(ch chan error) {
// Force loading free list if opened in ReadOnly mode.
tx.db.loadFreelist()
// Check if any pages are double freed.
freed := make(map[pgid]bool)
for _, id := range tx.db.freelist.all() {
all := make([]pgid, tx.db.freelist.count())
tx.db.freelist.copyall(all)
for _, id := range all {
if freed[id] {
ch <- fmt.Errorf("page %d: already freed", id)
}
@ -392,8 +413,10 @@ func (tx *Tx) check(ch chan error) {
reachable := make(map[pgid]*page)
reachable[0] = tx.page(0) // meta0
reachable[1] = tx.page(1) // meta1
for i := uint32(0); i <= tx.page(tx.meta.freelist).overflow; i++ {
reachable[tx.meta.freelist+pgid(i)] = tx.page(tx.meta.freelist)
if tx.meta.freelist != pgidNoFreelist {
for i := uint32(0); i <= tx.page(tx.meta.freelist).overflow; i++ {
reachable[tx.meta.freelist+pgid(i)] = tx.page(tx.meta.freelist)
}
}
// Recursively check buckets.
@ -451,7 +474,7 @@ func (tx *Tx) checkBucket(b *Bucket, reachable map[pgid]*page, freed map[pgid]bo
// allocate returns a contiguous block of memory starting at a given page.
func (tx *Tx) allocate(count int) (*page, error) {
p, err := tx.db.allocate(count)
p, err := tx.db.allocate(tx.meta.txid, count)
if err != nil {
return nil, err
}

View File

@ -36,12 +36,7 @@
// Contexts.
package context // import "golang.org/x/net/context"
import (
"errors"
"fmt"
"sync"
"time"
)
import "time"
// A Context carries a deadline, a cancelation signal, and other values across
// API boundaries.
@ -66,7 +61,7 @@ type Context interface {
//
// // Stream generates values with DoSomething and sends them to out
// // until DoSomething returns an error or ctx.Done is closed.
// func Stream(ctx context.Context, out <-chan Value) error {
// func Stream(ctx context.Context, out chan<- Value) error {
// for {
// v, err := DoSomething(ctx)
// if err != nil {
@ -138,48 +133,6 @@ type Context interface {
Value(key interface{}) interface{}
}
// Canceled is the error returned by Context.Err when the context is canceled.
var Canceled = errors.New("context canceled")
// DeadlineExceeded is the error returned by Context.Err when the context's
// deadline passes.
var DeadlineExceeded = errors.New("context deadline exceeded")
// An emptyCtx is never canceled, has no values, and has no deadline. It is not
// struct{}, since vars of this type must have distinct addresses.
type emptyCtx int
func (*emptyCtx) Deadline() (deadline time.Time, ok bool) {
return
}
func (*emptyCtx) Done() <-chan struct{} {
return nil
}
func (*emptyCtx) Err() error {
return nil
}
func (*emptyCtx) Value(key interface{}) interface{} {
return nil
}
func (e *emptyCtx) String() string {
switch e {
case background:
return "context.Background"
case todo:
return "context.TODO"
}
return "unknown empty Context"
}
var (
background = new(emptyCtx)
todo = new(emptyCtx)
)
// Background returns a non-nil, empty Context. It is never canceled, has no
// values, and has no deadline. It is typically used by the main function,
// initialization, and tests, and as the top-level Context for incoming
@ -201,247 +154,3 @@ func TODO() Context {
// A CancelFunc does not wait for the work to stop.
// After the first call, subsequent calls to a CancelFunc do nothing.
type CancelFunc func()
// WithCancel returns a copy of parent with a new Done channel. The returned
// context's Done channel is closed when the returned cancel function is called
// or when the parent context's Done channel is closed, whichever happens first.
//
// Canceling this context releases resources associated with it, so code should
// call cancel as soon as the operations running in this Context complete.
func WithCancel(parent Context) (ctx Context, cancel CancelFunc) {
c := newCancelCtx(parent)
propagateCancel(parent, &c)
return &c, func() { c.cancel(true, Canceled) }
}
// newCancelCtx returns an initialized cancelCtx.
func newCancelCtx(parent Context) cancelCtx {
return cancelCtx{
Context: parent,
done: make(chan struct{}),
}
}
// propagateCancel arranges for child to be canceled when parent is.
func propagateCancel(parent Context, child canceler) {
if parent.Done() == nil {
return // parent is never canceled
}
if p, ok := parentCancelCtx(parent); ok {
p.mu.Lock()
if p.err != nil {
// parent has already been canceled
child.cancel(false, p.err)
} else {
if p.children == nil {
p.children = make(map[canceler]bool)
}
p.children[child] = true
}
p.mu.Unlock()
} else {
go func() {
select {
case <-parent.Done():
child.cancel(false, parent.Err())
case <-child.Done():
}
}()
}
}
// parentCancelCtx follows a chain of parent references until it finds a
// *cancelCtx. This function understands how each of the concrete types in this
// package represents its parent.
func parentCancelCtx(parent Context) (*cancelCtx, bool) {
for {
switch c := parent.(type) {
case *cancelCtx:
return c, true
case *timerCtx:
return &c.cancelCtx, true
case *valueCtx:
parent = c.Context
default:
return nil, false
}
}
}
// removeChild removes a context from its parent.
func removeChild(parent Context, child canceler) {
p, ok := parentCancelCtx(parent)
if !ok {
return
}
p.mu.Lock()
if p.children != nil {
delete(p.children, child)
}
p.mu.Unlock()
}
// A canceler is a context type that can be canceled directly. The
// implementations are *cancelCtx and *timerCtx.
type canceler interface {
cancel(removeFromParent bool, err error)
Done() <-chan struct{}
}
// A cancelCtx can be canceled. When canceled, it also cancels any children
// that implement canceler.
type cancelCtx struct {
Context
done chan struct{} // closed by the first cancel call.
mu sync.Mutex
children map[canceler]bool // set to nil by the first cancel call
err error // set to non-nil by the first cancel call
}
func (c *cancelCtx) Done() <-chan struct{} {
return c.done
}
func (c *cancelCtx) Err() error {
c.mu.Lock()
defer c.mu.Unlock()
return c.err
}
func (c *cancelCtx) String() string {
return fmt.Sprintf("%v.WithCancel", c.Context)
}
// cancel closes c.done, cancels each of c's children, and, if
// removeFromParent is true, removes c from its parent's children.
func (c *cancelCtx) cancel(removeFromParent bool, err error) {
if err == nil {
panic("context: internal error: missing cancel error")
}
c.mu.Lock()
if c.err != nil {
c.mu.Unlock()
return // already canceled
}
c.err = err
close(c.done)
for child := range c.children {
// NOTE: acquiring the child's lock while holding parent's lock.
child.cancel(false, err)
}
c.children = nil
c.mu.Unlock()
if removeFromParent {
removeChild(c.Context, c)
}
}
// WithDeadline returns a copy of the parent context with the deadline adjusted
// to be no later than d. If the parent's deadline is already earlier than d,
// WithDeadline(parent, d) is semantically equivalent to parent. The returned
// context's Done channel is closed when the deadline expires, when the returned
// cancel function is called, or when the parent context's Done channel is
// closed, whichever happens first.
//
// Canceling this context releases resources associated with it, so code should
// call cancel as soon as the operations running in this Context complete.
func WithDeadline(parent Context, deadline time.Time) (Context, CancelFunc) {
if cur, ok := parent.Deadline(); ok && cur.Before(deadline) {
// The current deadline is already sooner than the new one.
return WithCancel(parent)
}
c := &timerCtx{
cancelCtx: newCancelCtx(parent),
deadline: deadline,
}
propagateCancel(parent, c)
d := deadline.Sub(time.Now())
if d <= 0 {
c.cancel(true, DeadlineExceeded) // deadline has already passed
return c, func() { c.cancel(true, Canceled) }
}
c.mu.Lock()
defer c.mu.Unlock()
if c.err == nil {
c.timer = time.AfterFunc(d, func() {
c.cancel(true, DeadlineExceeded)
})
}
return c, func() { c.cancel(true, Canceled) }
}
// A timerCtx carries a timer and a deadline. It embeds a cancelCtx to
// implement Done and Err. It implements cancel by stopping its timer then
// delegating to cancelCtx.cancel.
type timerCtx struct {
cancelCtx
timer *time.Timer // Under cancelCtx.mu.
deadline time.Time
}
func (c *timerCtx) Deadline() (deadline time.Time, ok bool) {
return c.deadline, true
}
func (c *timerCtx) String() string {
return fmt.Sprintf("%v.WithDeadline(%s [%s])", c.cancelCtx.Context, c.deadline, c.deadline.Sub(time.Now()))
}
func (c *timerCtx) cancel(removeFromParent bool, err error) {
c.cancelCtx.cancel(false, err)
if removeFromParent {
// Remove this timerCtx from its parent cancelCtx's children.
removeChild(c.cancelCtx.Context, c)
}
c.mu.Lock()
if c.timer != nil {
c.timer.Stop()
c.timer = nil
}
c.mu.Unlock()
}
// WithTimeout returns WithDeadline(parent, time.Now().Add(timeout)).
//
// Canceling this context releases resources associated with it, so code should
// call cancel as soon as the operations running in this Context complete:
//
// func slowOperationWithTimeout(ctx context.Context) (Result, error) {
// ctx, cancel := context.WithTimeout(ctx, 100*time.Millisecond)
// defer cancel() // releases resources if slowOperation completes before timeout elapses
// return slowOperation(ctx)
// }
func WithTimeout(parent Context, timeout time.Duration) (Context, CancelFunc) {
return WithDeadline(parent, time.Now().Add(timeout))
}
// WithValue returns a copy of parent in which the value associated with key is
// val.
//
// Use context Values only for request-scoped data that transits processes and
// APIs, not for passing optional parameters to functions.
func WithValue(parent Context, key interface{}, val interface{}) Context {
return &valueCtx{parent, key, val}
}
// A valueCtx carries a key-value pair. It implements Value for that key and
// delegates all other calls to the embedded Context.
type valueCtx struct {
Context
key, val interface{}
}
func (c *valueCtx) String() string {
return fmt.Sprintf("%v.WithValue(%#v, %#v)", c.Context, c.key, c.val)
}
func (c *valueCtx) Value(key interface{}) interface{} {
if c.key == key {
return c.val
}
return c.Context.Value(key)
}

72
cmd/vendor/golang.org/x/net/context/go17.go generated vendored Normal file
View File

@ -0,0 +1,72 @@
// Copyright 2016 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
// +build go1.7
package context
import (
"context" // standard library's context, as of Go 1.7
"time"
)
var (
todo = context.TODO()
background = context.Background()
)
// Canceled is the error returned by Context.Err when the context is canceled.
var Canceled = context.Canceled
// DeadlineExceeded is the error returned by Context.Err when the context's
// deadline passes.
var DeadlineExceeded = context.DeadlineExceeded
// WithCancel returns a copy of parent with a new Done channel. The returned
// context's Done channel is closed when the returned cancel function is called
// or when the parent context's Done channel is closed, whichever happens first.
//
// Canceling this context releases resources associated with it, so code should
// call cancel as soon as the operations running in this Context complete.
func WithCancel(parent Context) (ctx Context, cancel CancelFunc) {
ctx, f := context.WithCancel(parent)
return ctx, CancelFunc(f)
}
// WithDeadline returns a copy of the parent context with the deadline adjusted
// to be no later than d. If the parent's deadline is already earlier than d,
// WithDeadline(parent, d) is semantically equivalent to parent. The returned
// context's Done channel is closed when the deadline expires, when the returned
// cancel function is called, or when the parent context's Done channel is
// closed, whichever happens first.
//
// Canceling this context releases resources associated with it, so code should
// call cancel as soon as the operations running in this Context complete.
func WithDeadline(parent Context, deadline time.Time) (Context, CancelFunc) {
ctx, f := context.WithDeadline(parent, deadline)
return ctx, CancelFunc(f)
}
// WithTimeout returns WithDeadline(parent, time.Now().Add(timeout)).
//
// Canceling this context releases resources associated with it, so code should
// call cancel as soon as the operations running in this Context complete:
//
// func slowOperationWithTimeout(ctx context.Context) (Result, error) {
// ctx, cancel := context.WithTimeout(ctx, 100*time.Millisecond)
// defer cancel() // releases resources if slowOperation completes before timeout elapses
// return slowOperation(ctx)
// }
func WithTimeout(parent Context, timeout time.Duration) (Context, CancelFunc) {
return WithDeadline(parent, time.Now().Add(timeout))
}
// WithValue returns a copy of parent in which the value associated with key is
// val.
//
// Use context Values only for request-scoped data that transits processes and
// APIs, not for passing optional parameters to functions.
func WithValue(parent Context, key interface{}, val interface{}) Context {
return context.WithValue(parent, key, val)
}

300
cmd/vendor/golang.org/x/net/context/pre_go17.go generated vendored Normal file
View File

@ -0,0 +1,300 @@
// Copyright 2014 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
// +build !go1.7
package context
import (
"errors"
"fmt"
"sync"
"time"
)
// An emptyCtx is never canceled, has no values, and has no deadline. It is not
// struct{}, since vars of this type must have distinct addresses.
type emptyCtx int
func (*emptyCtx) Deadline() (deadline time.Time, ok bool) {
return
}
func (*emptyCtx) Done() <-chan struct{} {
return nil
}
func (*emptyCtx) Err() error {
return nil
}
func (*emptyCtx) Value(key interface{}) interface{} {
return nil
}
func (e *emptyCtx) String() string {
switch e {
case background:
return "context.Background"
case todo:
return "context.TODO"
}
return "unknown empty Context"
}
var (
background = new(emptyCtx)
todo = new(emptyCtx)
)
// Canceled is the error returned by Context.Err when the context is canceled.
var Canceled = errors.New("context canceled")
// DeadlineExceeded is the error returned by Context.Err when the context's
// deadline passes.
var DeadlineExceeded = errors.New("context deadline exceeded")
// WithCancel returns a copy of parent with a new Done channel. The returned
// context's Done channel is closed when the returned cancel function is called
// or when the parent context's Done channel is closed, whichever happens first.
//
// Canceling this context releases resources associated with it, so code should
// call cancel as soon as the operations running in this Context complete.
func WithCancel(parent Context) (ctx Context, cancel CancelFunc) {
c := newCancelCtx(parent)
propagateCancel(parent, c)
return c, func() { c.cancel(true, Canceled) }
}
// newCancelCtx returns an initialized cancelCtx.
func newCancelCtx(parent Context) *cancelCtx {
return &cancelCtx{
Context: parent,
done: make(chan struct{}),
}
}
// propagateCancel arranges for child to be canceled when parent is.
func propagateCancel(parent Context, child canceler) {
if parent.Done() == nil {
return // parent is never canceled
}
if p, ok := parentCancelCtx(parent); ok {
p.mu.Lock()
if p.err != nil {
// parent has already been canceled
child.cancel(false, p.err)
} else {
if p.children == nil {
p.children = make(map[canceler]bool)
}
p.children[child] = true
}
p.mu.Unlock()
} else {
go func() {
select {
case <-parent.Done():
child.cancel(false, parent.Err())
case <-child.Done():
}
}()
}
}
// parentCancelCtx follows a chain of parent references until it finds a
// *cancelCtx. This function understands how each of the concrete types in this
// package represents its parent.
func parentCancelCtx(parent Context) (*cancelCtx, bool) {
for {
switch c := parent.(type) {
case *cancelCtx:
return c, true
case *timerCtx:
return c.cancelCtx, true
case *valueCtx:
parent = c.Context
default:
return nil, false
}
}
}
// removeChild removes a context from its parent.
func removeChild(parent Context, child canceler) {
p, ok := parentCancelCtx(parent)
if !ok {
return
}
p.mu.Lock()
if p.children != nil {
delete(p.children, child)
}
p.mu.Unlock()
}
// A canceler is a context type that can be canceled directly. The
// implementations are *cancelCtx and *timerCtx.
type canceler interface {
cancel(removeFromParent bool, err error)
Done() <-chan struct{}
}
// A cancelCtx can be canceled. When canceled, it also cancels any children
// that implement canceler.
type cancelCtx struct {
Context
done chan struct{} // closed by the first cancel call.
mu sync.Mutex
children map[canceler]bool // set to nil by the first cancel call
err error // set to non-nil by the first cancel call
}
func (c *cancelCtx) Done() <-chan struct{} {
return c.done
}
func (c *cancelCtx) Err() error {
c.mu.Lock()
defer c.mu.Unlock()
return c.err
}
func (c *cancelCtx) String() string {
return fmt.Sprintf("%v.WithCancel", c.Context)
}
// cancel closes c.done, cancels each of c's children, and, if
// removeFromParent is true, removes c from its parent's children.
func (c *cancelCtx) cancel(removeFromParent bool, err error) {
if err == nil {
panic("context: internal error: missing cancel error")
}
c.mu.Lock()
if c.err != nil {
c.mu.Unlock()
return // already canceled
}
c.err = err
close(c.done)
for child := range c.children {
// NOTE: acquiring the child's lock while holding parent's lock.
child.cancel(false, err)
}
c.children = nil
c.mu.Unlock()
if removeFromParent {
removeChild(c.Context, c)
}
}
// WithDeadline returns a copy of the parent context with the deadline adjusted
// to be no later than d. If the parent's deadline is already earlier than d,
// WithDeadline(parent, d) is semantically equivalent to parent. The returned
// context's Done channel is closed when the deadline expires, when the returned
// cancel function is called, or when the parent context's Done channel is
// closed, whichever happens first.
//
// Canceling this context releases resources associated with it, so code should
// call cancel as soon as the operations running in this Context complete.
func WithDeadline(parent Context, deadline time.Time) (Context, CancelFunc) {
if cur, ok := parent.Deadline(); ok && cur.Before(deadline) {
// The current deadline is already sooner than the new one.
return WithCancel(parent)
}
c := &timerCtx{
cancelCtx: newCancelCtx(parent),
deadline: deadline,
}
propagateCancel(parent, c)
d := deadline.Sub(time.Now())
if d <= 0 {
c.cancel(true, DeadlineExceeded) // deadline has already passed
return c, func() { c.cancel(true, Canceled) }
}
c.mu.Lock()
defer c.mu.Unlock()
if c.err == nil {
c.timer = time.AfterFunc(d, func() {
c.cancel(true, DeadlineExceeded)
})
}
return c, func() { c.cancel(true, Canceled) }
}
// A timerCtx carries a timer and a deadline. It embeds a cancelCtx to
// implement Done and Err. It implements cancel by stopping its timer then
// delegating to cancelCtx.cancel.
type timerCtx struct {
*cancelCtx
timer *time.Timer // Under cancelCtx.mu.
deadline time.Time
}
func (c *timerCtx) Deadline() (deadline time.Time, ok bool) {
return c.deadline, true
}
func (c *timerCtx) String() string {
return fmt.Sprintf("%v.WithDeadline(%s [%s])", c.cancelCtx.Context, c.deadline, c.deadline.Sub(time.Now()))
}
func (c *timerCtx) cancel(removeFromParent bool, err error) {
c.cancelCtx.cancel(false, err)
if removeFromParent {
// Remove this timerCtx from its parent cancelCtx's children.
removeChild(c.cancelCtx.Context, c)
}
c.mu.Lock()
if c.timer != nil {
c.timer.Stop()
c.timer = nil
}
c.mu.Unlock()
}
// WithTimeout returns WithDeadline(parent, time.Now().Add(timeout)).
//
// Canceling this context releases resources associated with it, so code should
// call cancel as soon as the operations running in this Context complete:
//
// func slowOperationWithTimeout(ctx context.Context) (Result, error) {
// ctx, cancel := context.WithTimeout(ctx, 100*time.Millisecond)
// defer cancel() // releases resources if slowOperation completes before timeout elapses
// return slowOperation(ctx)
// }
func WithTimeout(parent Context, timeout time.Duration) (Context, CancelFunc) {
return WithDeadline(parent, time.Now().Add(timeout))
}
// WithValue returns a copy of parent in which the value associated with key is
// val.
//
// Use context Values only for request-scoped data that transits processes and
// APIs, not for passing optional parameters to functions.
func WithValue(parent Context, key interface{}, val interface{}) Context {
return &valueCtx{parent, key, val}
}
// A valueCtx carries a key-value pair. It implements Value for that key and
// delegates all other calls to the embedded Context.
type valueCtx struct {
Context
key, val interface{}
}
func (c *valueCtx) String() string {
return fmt.Sprintf("%v.WithValue(%#v, %#v)", c.Context, c.key, c.val)
}
func (c *valueCtx) Value(key interface{}) interface{} {
if c.key == key {
return c.val
}
return c.Context.Value(key)
}

View File

@ -18,6 +18,18 @@ type ClientConnPool interface {
MarkDead(*ClientConn)
}
// clientConnPoolIdleCloser is the interface implemented by ClientConnPool
// implementations which can close their idle connections.
type clientConnPoolIdleCloser interface {
ClientConnPool
closeIdleConnections()
}
var (
_ clientConnPoolIdleCloser = (*clientConnPool)(nil)
_ clientConnPoolIdleCloser = noDialClientConnPool{}
)
// TODO: use singleflight for dialing and addConnCalls?
type clientConnPool struct {
t *Transport
@ -40,7 +52,16 @@ const (
noDialOnMiss = false
)
func (p *clientConnPool) getClientConn(_ *http.Request, addr string, dialOnMiss bool) (*ClientConn, error) {
func (p *clientConnPool) getClientConn(req *http.Request, addr string, dialOnMiss bool) (*ClientConn, error) {
if isConnectionCloseRequest(req) && dialOnMiss {
// It gets its own connection.
const singleUse = true
cc, err := p.t.dialClientConn(addr, singleUse)
if err != nil {
return nil, err
}
return cc, nil
}
p.mu.Lock()
for _, cc := range p.conns[addr] {
if cc.CanTakeNewRequest() {
@ -83,7 +104,8 @@ func (p *clientConnPool) getStartDialLocked(addr string) *dialCall {
// run in its own goroutine.
func (c *dialCall) dial(addr string) {
c.res, c.err = c.p.t.dialClientConn(addr)
const singleUse = false // shared conn
c.res, c.err = c.p.t.dialClientConn(addr, singleUse)
close(c.done)
c.p.mu.Lock()
@ -223,3 +245,12 @@ func filterOutClientConn(in []*ClientConn, exclude *ClientConn) []*ClientConn {
}
return out
}
// noDialClientConnPool is an implementation of http2.ClientConnPool
// which never dials. We let the HTTP/1.1 client dial and use its TLS
// connection instead.
type noDialClientConnPool struct{ *clientConnPool }
func (p noDialClientConnPool) GetClientConn(req *http.Request, addr string) (*ClientConn, error) {
return p.getClientConn(req, addr, noDialOnMiss)
}

View File

@ -32,7 +32,7 @@ func configureTransport(t1 *http.Transport) (*Transport, error) {
t1.TLSClientConfig.NextProtos = append(t1.TLSClientConfig.NextProtos, "http/1.1")
}
upgradeFn := func(authority string, c *tls.Conn) http.RoundTripper {
addr := authorityAddr(authority)
addr := authorityAddr("https", authority)
if used, err := connPool.addConnIfNeeded(addr, t2, c); err != nil {
go c.Close()
return erringRoundTripper{err}
@ -67,15 +67,6 @@ func registerHTTPSProtocol(t *http.Transport, rt http.RoundTripper) (err error)
return nil
}
// noDialClientConnPool is an implementation of http2.ClientConnPool
// which never dials. We let the HTTP/1.1 client dial and use its TLS
// connection instead.
type noDialClientConnPool struct{ *clientConnPool }
func (p noDialClientConnPool) GetClientConn(req *http.Request, addr string) (*ClientConn, error) {
return p.getClientConn(req, addr, noDialOnMiss)
}
// noDialH2RoundTripper is a RoundTripper which only tries to complete the request
// if there's already has a cached connection to the host.
type noDialH2RoundTripper struct{ t *Transport }

View File

@ -64,9 +64,17 @@ func (e ConnectionError) Error() string { return fmt.Sprintf("connection error:
type StreamError struct {
StreamID uint32
Code ErrCode
Cause error // optional additional detail
}
func streamError(id uint32, code ErrCode) StreamError {
return StreamError{StreamID: id, Code: code}
}
func (e StreamError) Error() string {
if e.Cause != nil {
return fmt.Sprintf("stream error: stream ID %d; %v; %v", e.StreamID, e.Code, e.Cause)
}
return fmt.Sprintf("stream error: stream ID %d; %v", e.StreamID, e.Code)
}

View File

@ -15,6 +15,7 @@ import (
"sync"
"golang.org/x/net/http2/hpack"
"golang.org/x/net/lex/httplex"
)
const frameHeaderLen = 9
@ -316,10 +317,12 @@ type Framer struct {
// non-Continuation or Continuation on a different stream is
// attempted to be written.
logReads bool
logReads, logWrites bool
debugFramer *Framer // only use for logging written writes
debugFramerBuf *bytes.Buffer
debugFramer *Framer // only use for logging written writes
debugFramerBuf *bytes.Buffer
debugReadLoggerf func(string, ...interface{})
debugWriteLoggerf func(string, ...interface{})
}
func (fr *Framer) maxHeaderListSize() uint32 {
@ -354,7 +357,7 @@ func (f *Framer) endWrite() error {
byte(length>>16),
byte(length>>8),
byte(length))
if logFrameWrites {
if f.logWrites {
f.logWrite()
}
@ -377,10 +380,10 @@ func (f *Framer) logWrite() {
f.debugFramerBuf.Write(f.wbuf)
fr, err := f.debugFramer.ReadFrame()
if err != nil {
log.Printf("http2: Framer %p: failed to decode just-written frame", f)
f.debugWriteLoggerf("http2: Framer %p: failed to decode just-written frame", f)
return
}
log.Printf("http2: Framer %p: wrote %v", f, summarizeFrame(fr))
f.debugWriteLoggerf("http2: Framer %p: wrote %v", f, summarizeFrame(fr))
}
func (f *Framer) writeByte(v byte) { f.wbuf = append(f.wbuf, v) }
@ -398,9 +401,12 @@ const (
// NewFramer returns a Framer that writes frames to w and reads them from r.
func NewFramer(w io.Writer, r io.Reader) *Framer {
fr := &Framer{
w: w,
r: r,
logReads: logFrameReads,
w: w,
r: r,
logReads: logFrameReads,
logWrites: logFrameWrites,
debugReadLoggerf: log.Printf,
debugWriteLoggerf: log.Printf,
}
fr.getReadBuf = func(size uint32) []byte {
if cap(fr.readBuf) >= int(size) {
@ -453,7 +459,7 @@ func terminalReadFrameError(err error) bool {
//
// If the frame is larger than previously set with SetMaxReadFrameSize, the
// returned error is ErrFrameTooLarge. Other errors may be of type
// ConnectionError, StreamError, or anything else from from the underlying
// ConnectionError, StreamError, or anything else from the underlying
// reader.
func (fr *Framer) ReadFrame() (Frame, error) {
fr.errDetail = nil
@ -482,7 +488,7 @@ func (fr *Framer) ReadFrame() (Frame, error) {
return nil, err
}
if fr.logReads {
log.Printf("http2: Framer %p: read %v", fr, summarizeFrame(f))
fr.debugReadLoggerf("http2: Framer %p: read %v", fr, summarizeFrame(f))
}
if fh.Type == FrameHeaders && fr.ReadMetaHeaders != nil {
return fr.readMetaFrame(f.(*HeadersFrame))
@ -590,7 +596,15 @@ func parseDataFrame(fh FrameHeader, payload []byte) (Frame, error) {
return f, nil
}
var errStreamID = errors.New("invalid streamid")
var (
errStreamID = errors.New("invalid stream ID")
errDepStreamID = errors.New("invalid dependent stream ID")
errPadLength = errors.New("pad length too large")
)
func validStreamIDOrZero(streamID uint32) bool {
return streamID&(1<<31) == 0
}
func validStreamID(streamID uint32) bool {
return streamID != 0 && streamID&(1<<31) == 0
@ -599,18 +613,40 @@ func validStreamID(streamID uint32) bool {
// WriteData writes a DATA frame.
//
// It will perform exactly one Write to the underlying Writer.
// It is the caller's responsibility to not call other Write methods concurrently.
// It is the caller's responsibility not to violate the maximum frame size
// and to not call other Write methods concurrently.
func (f *Framer) WriteData(streamID uint32, endStream bool, data []byte) error {
// TODO: ignoring padding for now. will add when somebody cares.
return f.WriteDataPadded(streamID, endStream, data, nil)
}
// WriteData writes a DATA frame with optional padding.
//
// If pad is nil, the padding bit is not sent.
// The length of pad must not exceed 255 bytes.
//
// It will perform exactly one Write to the underlying Writer.
// It is the caller's responsibility not to violate the maximum frame size
// and to not call other Write methods concurrently.
func (f *Framer) WriteDataPadded(streamID uint32, endStream bool, data, pad []byte) error {
if !validStreamID(streamID) && !f.AllowIllegalWrites {
return errStreamID
}
if len(pad) > 255 {
return errPadLength
}
var flags Flags
if endStream {
flags |= FlagDataEndStream
}
if pad != nil {
flags |= FlagDataPadded
}
f.startWrite(FrameData, flags, streamID)
if pad != nil {
f.wbuf = append(f.wbuf, byte(len(pad)))
}
f.wbuf = append(f.wbuf, data...)
f.wbuf = append(f.wbuf, pad...)
return f.endWrite()
}
@ -706,7 +742,7 @@ func (f *Framer) WriteSettings(settings ...Setting) error {
return f.endWrite()
}
// WriteSettings writes an empty SETTINGS frame with the ACK bit set.
// WriteSettingsAck writes an empty SETTINGS frame with the ACK bit set.
//
// It will perform exactly one Write to the underlying Writer.
// It is the caller's responsibility to not call other Write methods concurrently.
@ -832,7 +868,7 @@ func parseWindowUpdateFrame(fh FrameHeader, p []byte) (Frame, error) {
if fh.StreamID == 0 {
return nil, ConnectionError(ErrCodeProtocol)
}
return nil, StreamError{fh.StreamID, ErrCodeProtocol}
return nil, streamError(fh.StreamID, ErrCodeProtocol)
}
return &WindowUpdateFrame{
FrameHeader: fh,
@ -913,7 +949,7 @@ func parseHeadersFrame(fh FrameHeader, p []byte) (_ Frame, err error) {
}
}
if len(p)-int(padLength) <= 0 {
return nil, StreamError{fh.StreamID, ErrCodeProtocol}
return nil, streamError(fh.StreamID, ErrCodeProtocol)
}
hf.headerFragBuf = p[:len(p)-int(padLength)]
return hf, nil
@ -977,8 +1013,8 @@ func (f *Framer) WriteHeaders(p HeadersFrameParam) error {
}
if !p.Priority.IsZero() {
v := p.Priority.StreamDep
if !validStreamID(v) && !f.AllowIllegalWrites {
return errors.New("invalid dependent stream id")
if !validStreamIDOrZero(v) && !f.AllowIllegalWrites {
return errDepStreamID
}
if p.Priority.Exclusive {
v |= 1 << 31
@ -1046,6 +1082,9 @@ func (f *Framer) WritePriority(streamID uint32, p PriorityParam) error {
if !validStreamID(streamID) && !f.AllowIllegalWrites {
return errStreamID
}
if !validStreamIDOrZero(p.StreamDep) {
return errDepStreamID
}
f.startWrite(FramePriority, 0, streamID)
v := p.StreamDep
if p.Exclusive {
@ -1385,7 +1424,10 @@ func (fr *Framer) readMetaFrame(hf *HeadersFrame) (*MetaHeadersFrame, error) {
hdec.SetEmitEnabled(true)
hdec.SetMaxStringLength(fr.maxHeaderStringLen())
hdec.SetEmitFunc(func(hf hpack.HeaderField) {
if !validHeaderFieldValue(hf.Value) {
if VerboseLogs && fr.logReads {
fr.debugReadLoggerf("http2: decoded hpack field %+v", hf)
}
if !httplex.ValidHeaderFieldValue(hf.Value) {
invalid = headerFieldValueError(hf.Value)
}
isPseudo := strings.HasPrefix(hf.Name, ":")
@ -1395,7 +1437,7 @@ func (fr *Framer) readMetaFrame(hf *HeadersFrame) (*MetaHeadersFrame, error) {
}
} else {
sawRegular = true
if !validHeaderFieldName(hf.Name) {
if !validWireHeaderFieldName(hf.Name) {
invalid = headerFieldNameError(hf.Name)
}
}
@ -1443,11 +1485,17 @@ func (fr *Framer) readMetaFrame(hf *HeadersFrame) (*MetaHeadersFrame, error) {
}
if invalid != nil {
fr.errDetail = invalid
return nil, StreamError{mh.StreamID, ErrCodeProtocol}
if VerboseLogs {
log.Printf("http2: invalid header: %v", invalid)
}
return nil, StreamError{mh.StreamID, ErrCodeProtocol, invalid}
}
if err := mh.checkPseudos(); err != nil {
fr.errDetail = err
return nil, StreamError{mh.StreamID, ErrCodeProtocol}
if VerboseLogs {
log.Printf("http2: invalid pseudo headers: %v", err)
}
return nil, StreamError{mh.StreamID, ErrCodeProtocol, err}
}
return mh, nil
}

View File

@ -1,11 +0,0 @@
// Copyright 2015 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
// +build go1.5
package http2
import "net/http"
func requestCancel(req *http.Request) <-chan struct{} { return req.Cancel }

43
cmd/vendor/golang.org/x/net/http2/go16.go generated vendored Normal file
View File

@ -0,0 +1,43 @@
// Copyright 2016 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
// +build go1.6
package http2
import (
"crypto/tls"
"net/http"
"time"
)
func transportExpectContinueTimeout(t1 *http.Transport) time.Duration {
return t1.ExpectContinueTimeout
}
// isBadCipher reports whether the cipher is blacklisted by the HTTP/2 spec.
func isBadCipher(cipher uint16) bool {
switch cipher {
case tls.TLS_RSA_WITH_RC4_128_SHA,
tls.TLS_RSA_WITH_3DES_EDE_CBC_SHA,
tls.TLS_RSA_WITH_AES_128_CBC_SHA,
tls.TLS_RSA_WITH_AES_256_CBC_SHA,
tls.TLS_RSA_WITH_AES_128_GCM_SHA256,
tls.TLS_RSA_WITH_AES_256_GCM_SHA384,
tls.TLS_ECDHE_ECDSA_WITH_RC4_128_SHA,
tls.TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA,
tls.TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA,
tls.TLS_ECDHE_RSA_WITH_RC4_128_SHA,
tls.TLS_ECDHE_RSA_WITH_3DES_EDE_CBC_SHA,
tls.TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,
tls.TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA:
// Reject cipher suites from Appendix A.
// "This list includes those cipher suites that do not
// offer an ephemeral key exchange and those that are
// based on the TLS null, stream or block cipher type"
return true
default:
return false
}
}

106
cmd/vendor/golang.org/x/net/http2/go17.go generated vendored Normal file
View File

@ -0,0 +1,106 @@
// Copyright 2016 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
// +build go1.7
package http2
import (
"context"
"net"
"net/http"
"net/http/httptrace"
"time"
)
type contextContext interface {
context.Context
}
func serverConnBaseContext(c net.Conn, opts *ServeConnOpts) (ctx contextContext, cancel func()) {
ctx, cancel = context.WithCancel(context.Background())
ctx = context.WithValue(ctx, http.LocalAddrContextKey, c.LocalAddr())
if hs := opts.baseConfig(); hs != nil {
ctx = context.WithValue(ctx, http.ServerContextKey, hs)
}
return
}
func contextWithCancel(ctx contextContext) (_ contextContext, cancel func()) {
return context.WithCancel(ctx)
}
func requestWithContext(req *http.Request, ctx contextContext) *http.Request {
return req.WithContext(ctx)
}
type clientTrace httptrace.ClientTrace
func reqContext(r *http.Request) context.Context { return r.Context() }
func (t *Transport) idleConnTimeout() time.Duration {
if t.t1 != nil {
return t.t1.IdleConnTimeout
}
return 0
}
func setResponseUncompressed(res *http.Response) { res.Uncompressed = true }
func traceGotConn(req *http.Request, cc *ClientConn) {
trace := httptrace.ContextClientTrace(req.Context())
if trace == nil || trace.GotConn == nil {
return
}
ci := httptrace.GotConnInfo{Conn: cc.tconn}
cc.mu.Lock()
ci.Reused = cc.nextStreamID > 1
ci.WasIdle = len(cc.streams) == 0 && ci.Reused
if ci.WasIdle && !cc.lastActive.IsZero() {
ci.IdleTime = time.Now().Sub(cc.lastActive)
}
cc.mu.Unlock()
trace.GotConn(ci)
}
func traceWroteHeaders(trace *clientTrace) {
if trace != nil && trace.WroteHeaders != nil {
trace.WroteHeaders()
}
}
func traceGot100Continue(trace *clientTrace) {
if trace != nil && trace.Got100Continue != nil {
trace.Got100Continue()
}
}
func traceWait100Continue(trace *clientTrace) {
if trace != nil && trace.Wait100Continue != nil {
trace.Wait100Continue()
}
}
func traceWroteRequest(trace *clientTrace, err error) {
if trace != nil && trace.WroteRequest != nil {
trace.WroteRequest(httptrace.WroteRequestInfo{Err: err})
}
}
func traceFirstResponseByte(trace *clientTrace) {
if trace != nil && trace.GotFirstResponseByte != nil {
trace.GotFirstResponseByte()
}
}
func requestTrace(req *http.Request) *clientTrace {
trace := httptrace.ContextClientTrace(req.Context())
return (*clientTrace)(trace)
}
// Ping sends a PING frame to the server and waits for the ack.
func (cc *ClientConn) Ping(ctx context.Context) error {
return cc.ping(ctx)
}

36
cmd/vendor/golang.org/x/net/http2/go17_not18.go generated vendored Normal file
View File

@ -0,0 +1,36 @@
// Copyright 2016 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
// +build go1.7,!go1.8
package http2
import "crypto/tls"
// temporary copy of Go 1.7's private tls.Config.clone:
func cloneTLSConfig(c *tls.Config) *tls.Config {
return &tls.Config{
Rand: c.Rand,
Time: c.Time,
Certificates: c.Certificates,
NameToCertificate: c.NameToCertificate,
GetCertificate: c.GetCertificate,
RootCAs: c.RootCAs,
NextProtos: c.NextProtos,
ServerName: c.ServerName,
ClientAuth: c.ClientAuth,
ClientCAs: c.ClientCAs,
InsecureSkipVerify: c.InsecureSkipVerify,
CipherSuites: c.CipherSuites,
PreferServerCipherSuites: c.PreferServerCipherSuites,
SessionTicketsDisabled: c.SessionTicketsDisabled,
SessionTicketKey: c.SessionTicketKey,
ClientSessionCache: c.ClientSessionCache,
MinVersion: c.MinVersion,
MaxVersion: c.MaxVersion,
CurvePreferences: c.CurvePreferences,
DynamicRecordSizingDisabled: c.DynamicRecordSizingDisabled,
Renegotiation: c.Renegotiation,
}
}

50
cmd/vendor/golang.org/x/net/http2/go18.go generated vendored Normal file
View File

@ -0,0 +1,50 @@
// Copyright 2015 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
// +build go1.8
package http2
import (
"crypto/tls"
"io"
"net/http"
)
func cloneTLSConfig(c *tls.Config) *tls.Config { return c.Clone() }
var _ http.Pusher = (*responseWriter)(nil)
// Push implements http.Pusher.
func (w *responseWriter) Push(target string, opts *http.PushOptions) error {
internalOpts := pushOptions{}
if opts != nil {
internalOpts.Method = opts.Method
internalOpts.Header = opts.Header
}
return w.push(target, internalOpts)
}
func configureServer18(h1 *http.Server, h2 *Server) error {
if h2.IdleTimeout == 0 {
if h1.IdleTimeout != 0 {
h2.IdleTimeout = h1.IdleTimeout
} else {
h2.IdleTimeout = h1.ReadTimeout
}
}
return nil
}
func shouldLogPanic(panicValue interface{}) bool {
return panicValue != nil && panicValue != http.ErrAbortHandler
}
func reqGetBody(req *http.Request) func() (io.ReadCloser, error) {
return req.GetBody
}
func reqBodyIsNoBody(body io.ReadCloser) bool {
return body == http.NoBody
}

View File

@ -43,7 +43,7 @@ type HeaderField struct {
// IsPseudo reports whether the header field is an http2 pseudo header.
// That is, it reports whether it starts with a colon.
// It is not otherwise guaranteed to be a valid psuedo header field,
// It is not otherwise guaranteed to be a valid pseudo header field,
// though.
func (hf HeaderField) IsPseudo() bool {
return len(hf.Name) != 0 && hf.Name[0] == ':'
@ -57,7 +57,7 @@ func (hf HeaderField) String() string {
return fmt.Sprintf("header field %q = %q%s", hf.Name, hf.Value, suffix)
}
// Size returns the size of an entry per RFC 7540 section 5.2.
// Size returns the size of an entry per RFC 7541 section 4.1.
func (hf HeaderField) Size() uint32 {
// http://http2.github.io/http2-spec/compression.html#rfc.section.4.1
// "The size of the dynamic table is the sum of the size of

View File

@ -48,12 +48,16 @@ var ErrInvalidHuffman = errors.New("hpack: invalid Huffman-encoded data")
// maxLen bytes will return ErrStringLength.
func huffmanDecode(buf *bytes.Buffer, maxLen int, v []byte) error {
n := rootHuffmanNode
cur, nbits := uint(0), uint8(0)
// cur is the bit buffer that has not been fed into n.
// cbits is the number of low order bits in cur that are valid.
// sbits is the number of bits of the symbol prefix being decoded.
cur, cbits, sbits := uint(0), uint8(0), uint8(0)
for _, b := range v {
cur = cur<<8 | uint(b)
nbits += 8
for nbits >= 8 {
idx := byte(cur >> (nbits - 8))
cbits += 8
sbits += 8
for cbits >= 8 {
idx := byte(cur >> (cbits - 8))
n = n.children[idx]
if n == nil {
return ErrInvalidHuffman
@ -63,22 +67,40 @@ func huffmanDecode(buf *bytes.Buffer, maxLen int, v []byte) error {
return ErrStringLength
}
buf.WriteByte(n.sym)
nbits -= n.codeLen
cbits -= n.codeLen
n = rootHuffmanNode
sbits = cbits
} else {
nbits -= 8
cbits -= 8
}
}
}
for nbits > 0 {
n = n.children[byte(cur<<(8-nbits))]
if n.children != nil || n.codeLen > nbits {
for cbits > 0 {
n = n.children[byte(cur<<(8-cbits))]
if n == nil {
return ErrInvalidHuffman
}
if n.children != nil || n.codeLen > cbits {
break
}
if maxLen != 0 && buf.Len() == maxLen {
return ErrStringLength
}
buf.WriteByte(n.sym)
nbits -= n.codeLen
cbits -= n.codeLen
n = rootHuffmanNode
sbits = cbits
}
if sbits > 7 {
// Either there was an incomplete symbol, or overlong padding.
// Both are decoding errors per RFC 7541 section 5.2.
return ErrInvalidHuffman
}
if mask := uint(1<<cbits - 1); cur&mask != mask {
// Trailing bits must be a prefix of EOS per RFC 7541 section 5.2.
return ErrInvalidHuffman
}
return nil
}

View File

@ -13,7 +13,8 @@
// See https://http2.github.io/ for more information on HTTP/2.
//
// See https://http2.golang.org/ for a test server running this code.
package http2
//
package http2 // import "golang.org/x/net/http2"
import (
"bufio"
@ -23,15 +24,19 @@ import (
"io"
"net/http"
"os"
"sort"
"strconv"
"strings"
"sync"
"golang.org/x/net/lex/httplex"
)
var (
VerboseLogs bool
logFrameWrites bool
logFrameReads bool
inTests bool
)
func init() {
@ -73,13 +78,23 @@ var (
type streamState int
// HTTP/2 stream states.
//
// See http://tools.ietf.org/html/rfc7540#section-5.1.
//
// For simplicity, the server code merges "reserved (local)" into
// "half-closed (remote)". This is one less state transition to track.
// The only downside is that we send PUSH_PROMISEs slightly less
// liberally than allowable. More discussion here:
// https://lists.w3.org/Archives/Public/ietf-http-wg/2016JulSep/0599.html
//
// "reserved (remote)" is omitted since the client code does not
// support server push.
const (
stateIdle streamState = iota
stateOpen
stateHalfClosedLocal
stateHalfClosedRemote
stateResvLocal
stateResvRemote
stateClosed
)
@ -88,8 +103,6 @@ var stateName = [...]string{
stateOpen: "Open",
stateHalfClosedLocal: "HalfClosedLocal",
stateHalfClosedRemote: "HalfClosedRemote",
stateResvLocal: "ResvLocal",
stateResvRemote: "ResvRemote",
stateClosed: "Closed",
}
@ -165,57 +178,23 @@ var (
errInvalidHeaderFieldValue = errors.New("http2: invalid header field value")
)
// validHeaderFieldName reports whether v is a valid header field name (key).
// RFC 7230 says:
// header-field = field-name ":" OWS field-value OWS
// field-name = token
// tchar = "!" / "#" / "$" / "%" / "&" / "'" / "*" / "+" / "-" / "." /
// "^" / "_" / "
// validWireHeaderFieldName reports whether v is a valid header field
// name (key). See httplex.ValidHeaderName for the base rules.
//
// Further, http2 says:
// "Just as in HTTP/1.x, header field names are strings of ASCII
// characters that are compared in a case-insensitive
// fashion. However, header field names MUST be converted to
// lowercase prior to their encoding in HTTP/2. "
func validHeaderFieldName(v string) bool {
func validWireHeaderFieldName(v string) bool {
if len(v) == 0 {
return false
}
for _, r := range v {
if int(r) >= len(isTokenTable) || ('A' <= r && r <= 'Z') {
if !httplex.IsTokenRune(r) {
return false
}
if !isTokenTable[byte(r)] {
return false
}
}
return true
}
// validHeaderFieldValue reports whether v is a valid header field value.
//
// RFC 7230 says:
// field-value = *( field-content / obs-fold )
// obj-fold = N/A to http2, and deprecated
// field-content = field-vchar [ 1*( SP / HTAB ) field-vchar ]
// field-vchar = VCHAR / obs-text
// obs-text = %x80-FF
// VCHAR = "any visible [USASCII] character"
//
// http2 further says: "Similarly, HTTP/2 allows header field values
// that are not valid. While most of the values that can be encoded
// will not alter header field parsing, carriage return (CR, ASCII
// 0xd), line feed (LF, ASCII 0xa), and the zero character (NUL, ASCII
// 0x0) might be exploited by an attacker if they are translated
// verbatim. Any request or response that contains a character not
// permitted in a header field value MUST be treated as malformed
// (Section 8.1.2.6). Valid characters are defined by the
// field-content ABNF rule in Section 3.2 of [RFC7230]."
//
// This function does not (yet?) properly handle the rejection of
// strings that begin or end with SP or HTAB.
func validHeaderFieldValue(v string) bool {
for i := 0; i < len(v); i++ {
if b := v[i]; b < ' ' && b != '\t' || b == 0x7f {
if 'A' <= r && r <= 'Z' {
return false
}
}
@ -283,14 +262,27 @@ func newBufferedWriter(w io.Writer) *bufferedWriter {
return &bufferedWriter{w: w}
}
// bufWriterPoolBufferSize is the size of bufio.Writer's
// buffers created using bufWriterPool.
//
// TODO: pick a less arbitrary value? this is a bit under
// (3 x typical 1500 byte MTU) at least. Other than that,
// not much thought went into it.
const bufWriterPoolBufferSize = 4 << 10
var bufWriterPool = sync.Pool{
New: func() interface{} {
// TODO: pick something better? this is a bit under
// (3 x typical 1500 byte MTU) at least.
return bufio.NewWriterSize(nil, 4<<10)
return bufio.NewWriterSize(nil, bufWriterPoolBufferSize)
},
}
func (w *bufferedWriter) Available() int {
if w.bw == nil {
return bufWriterPoolBufferSize
}
return w.bw.Available()
}
func (w *bufferedWriter) Write(p []byte) (n int, err error) {
if w.bw == nil {
bw := bufWriterPool.Get().(*bufio.Writer)
@ -320,7 +312,7 @@ func mustUint31(v int32) uint32 {
}
// bodyAllowedForStatus reports whether a given response status code
// permits a body. See RFC2616, section 4.4.
// permits a body. See RFC 2616, section 4.4.
func bodyAllowedForStatus(status int) bool {
switch {
case status >= 100 && status <= 199:
@ -344,86 +336,52 @@ func (e *httpError) Temporary() bool { return true }
var errTimeout error = &httpError{msg: "http2: timeout awaiting response headers", timeout: true}
var isTokenTable = [127]bool{
'!': true,
'#': true,
'$': true,
'%': true,
'&': true,
'\'': true,
'*': true,
'+': true,
'-': true,
'.': true,
'0': true,
'1': true,
'2': true,
'3': true,
'4': true,
'5': true,
'6': true,
'7': true,
'8': true,
'9': true,
'A': true,
'B': true,
'C': true,
'D': true,
'E': true,
'F': true,
'G': true,
'H': true,
'I': true,
'J': true,
'K': true,
'L': true,
'M': true,
'N': true,
'O': true,
'P': true,
'Q': true,
'R': true,
'S': true,
'T': true,
'U': true,
'W': true,
'V': true,
'X': true,
'Y': true,
'Z': true,
'^': true,
'_': true,
'`': true,
'a': true,
'b': true,
'c': true,
'd': true,
'e': true,
'f': true,
'g': true,
'h': true,
'i': true,
'j': true,
'k': true,
'l': true,
'm': true,
'n': true,
'o': true,
'p': true,
'q': true,
'r': true,
's': true,
't': true,
'u': true,
'v': true,
'w': true,
'x': true,
'y': true,
'z': true,
'|': true,
'~': true,
}
type connectionStater interface {
ConnectionState() tls.ConnectionState
}
var sorterPool = sync.Pool{New: func() interface{} { return new(sorter) }}
type sorter struct {
v []string // owned by sorter
}
func (s *sorter) Len() int { return len(s.v) }
func (s *sorter) Swap(i, j int) { s.v[i], s.v[j] = s.v[j], s.v[i] }
func (s *sorter) Less(i, j int) bool { return s.v[i] < s.v[j] }
// Keys returns the sorted keys of h.
//
// The returned slice is only valid until s used again or returned to
// its pool.
func (s *sorter) Keys(h http.Header) []string {
keys := s.v[:0]
for k := range h {
keys = append(keys, k)
}
s.v = keys
sort.Sort(s)
return keys
}
func (s *sorter) SortStrings(ss []string) {
// Our sorter works on s.v, which sorter owns, so
// stash it away while we sort the user's buffer.
save := s.v
s.v = ss
sort.Sort(s)
s.v = save
}
// validPseudoPath reports whether v is a valid :path pseudo-header
// value. It must be either:
//
// *) a non-empty string starting with '/', but not with with "//",
// *) the string '*', for OPTIONS requests.
//
// For now this is only used a quick check for deciding when to clean
// up Opaque URLs before sending requests from the Transport.
// See golang.org/issue/16847
func validPseudoPath(v string) bool {
return (len(v) > 0 && v[0] == '/' && (len(v) == 1 || v[1] != '/')) || v == "*"
}

View File

@ -1,11 +0,0 @@
// Copyright 2015 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
// +build !go1.5
package http2
import "net/http"
func requestCancel(req *http.Request) <-chan struct{} { return nil }

View File

@ -6,8 +6,41 @@
package http2
import "net/http"
import (
"crypto/tls"
"net/http"
"time"
)
func configureTransport(t1 *http.Transport) (*Transport, error) {
return nil, errTransportVersion
}
func transportExpectContinueTimeout(t1 *http.Transport) time.Duration {
return 0
}
// isBadCipher reports whether the cipher is blacklisted by the HTTP/2 spec.
func isBadCipher(cipher uint16) bool {
switch cipher {
case tls.TLS_RSA_WITH_RC4_128_SHA,
tls.TLS_RSA_WITH_3DES_EDE_CBC_SHA,
tls.TLS_RSA_WITH_AES_128_CBC_SHA,
tls.TLS_RSA_WITH_AES_256_CBC_SHA,
tls.TLS_ECDHE_ECDSA_WITH_RC4_128_SHA,
tls.TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA,
tls.TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA,
tls.TLS_ECDHE_RSA_WITH_RC4_128_SHA,
tls.TLS_ECDHE_RSA_WITH_3DES_EDE_CBC_SHA,
tls.TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,
tls.TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA:
// Reject cipher suites from Appendix A.
// "This list includes those cipher suites that do not
// offer an ephemeral key exchange and those that are
// based on the TLS null, stream or block cipher type"
return true
default:
return false
}
}

87
cmd/vendor/golang.org/x/net/http2/not_go17.go generated vendored Normal file
View File

@ -0,0 +1,87 @@
// Copyright 2016 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
// +build !go1.7
package http2
import (
"crypto/tls"
"net"
"net/http"
"time"
)
type contextContext interface {
Done() <-chan struct{}
Err() error
}
type fakeContext struct{}
func (fakeContext) Done() <-chan struct{} { return nil }
func (fakeContext) Err() error { panic("should not be called") }
func reqContext(r *http.Request) fakeContext {
return fakeContext{}
}
func setResponseUncompressed(res *http.Response) {
// Nothing.
}
type clientTrace struct{}
func requestTrace(*http.Request) *clientTrace { return nil }
func traceGotConn(*http.Request, *ClientConn) {}
func traceFirstResponseByte(*clientTrace) {}
func traceWroteHeaders(*clientTrace) {}
func traceWroteRequest(*clientTrace, error) {}
func traceGot100Continue(trace *clientTrace) {}
func traceWait100Continue(trace *clientTrace) {}
func nop() {}
func serverConnBaseContext(c net.Conn, opts *ServeConnOpts) (ctx contextContext, cancel func()) {
return nil, nop
}
func contextWithCancel(ctx contextContext) (_ contextContext, cancel func()) {
return ctx, nop
}
func requestWithContext(req *http.Request, ctx contextContext) *http.Request {
return req
}
// temporary copy of Go 1.6's private tls.Config.clone:
func cloneTLSConfig(c *tls.Config) *tls.Config {
return &tls.Config{
Rand: c.Rand,
Time: c.Time,
Certificates: c.Certificates,
NameToCertificate: c.NameToCertificate,
GetCertificate: c.GetCertificate,
RootCAs: c.RootCAs,
NextProtos: c.NextProtos,
ServerName: c.ServerName,
ClientAuth: c.ClientAuth,
ClientCAs: c.ClientCAs,
InsecureSkipVerify: c.InsecureSkipVerify,
CipherSuites: c.CipherSuites,
PreferServerCipherSuites: c.PreferServerCipherSuites,
SessionTicketsDisabled: c.SessionTicketsDisabled,
SessionTicketKey: c.SessionTicketKey,
ClientSessionCache: c.ClientSessionCache,
MinVersion: c.MinVersion,
MaxVersion: c.MaxVersion,
CurvePreferences: c.CurvePreferences,
}
}
func (cc *ClientConn) Ping(ctx contextContext) error {
return cc.ping(ctx)
}
func (t *Transport) idleConnTimeout() time.Duration { return 0 }

27
cmd/vendor/golang.org/x/net/http2/not_go18.go generated vendored Normal file
View File

@ -0,0 +1,27 @@
// Copyright 2016 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
// +build !go1.8
package http2
import (
"io"
"net/http"
)
func configureServer18(h1 *http.Server, h2 *Server) error {
// No IdleTimeout to sync prior to Go 1.8.
return nil
}
func shouldLogPanic(panicValue interface{}) bool {
return panicValue != nil
}
func reqGetBody(req *http.Request) func() (io.ReadCloser, error) {
return nil
}
func reqBodyIsNoBody(io.ReadCloser) bool { return false }

View File

@ -29,6 +29,12 @@ type pipeBuffer interface {
io.Reader
}
func (p *pipe) Len() int {
p.mu.Lock()
defer p.mu.Unlock()
return p.b.Len()
}
// Read waits until data is available and copies bytes
// from the buffer into p.
func (p *pipe) Read(d []byte) (n int, err error) {

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -9,15 +9,21 @@ import (
"fmt"
"log"
"net/http"
"sort"
"net/url"
"time"
"golang.org/x/net/http2/hpack"
"golang.org/x/net/lex/httplex"
)
// writeFramer is implemented by any type that is used to write frames.
type writeFramer interface {
writeFrame(writeContext) error
// staysWithinBuffer reports whether this writer promises that
// it will only write less than or equal to size bytes, and it
// won't Flush the write context.
staysWithinBuffer(size int) bool
}
// writeContext is the interface needed by the various frame writer
@ -39,9 +45,10 @@ type writeContext interface {
HeaderEncoder() (*hpack.Encoder, *bytes.Buffer)
}
// endsStream reports whether the given frame writer w will locally
// close the stream.
func endsStream(w writeFramer) bool {
// writeEndsStream reports whether w writes a frame that will transition
// the stream to a half-closed local state. This returns false for RST_STREAM,
// which closes the entire stream (not just the local half).
func writeEndsStream(w writeFramer) bool {
switch v := w.(type) {
case *writeData:
return v.endStream
@ -51,7 +58,7 @@ func endsStream(w writeFramer) bool {
// This can only happen if the caller reuses w after it's
// been intentionally nil'ed out to prevent use. Keep this
// here to catch future refactoring breaking it.
panic("endsStream called on nil writeFramer")
panic("writeEndsStream called on nil writeFramer")
}
return false
}
@ -62,8 +69,16 @@ func (flushFrameWriter) writeFrame(ctx writeContext) error {
return ctx.Flush()
}
func (flushFrameWriter) staysWithinBuffer(max int) bool { return false }
type writeSettings []Setting
func (s writeSettings) staysWithinBuffer(max int) bool {
const settingSize = 6 // uint16 + uint32
return frameHeaderLen+settingSize*len(s) <= max
}
func (s writeSettings) writeFrame(ctx writeContext) error {
return ctx.Framer().WriteSettings([]Setting(s)...)
}
@ -83,6 +98,8 @@ func (p *writeGoAway) writeFrame(ctx writeContext) error {
return err
}
func (*writeGoAway) staysWithinBuffer(max int) bool { return false } // flushes
type writeData struct {
streamID uint32
p []byte
@ -97,6 +114,10 @@ func (w *writeData) writeFrame(ctx writeContext) error {
return ctx.Framer().WriteData(w.streamID, w.endStream, w.p)
}
func (w *writeData) staysWithinBuffer(max int) bool {
return frameHeaderLen+len(w.p) <= max
}
// handlerPanicRST is the message sent from handler goroutines when
// the handler panics.
type handlerPanicRST struct {
@ -107,22 +128,57 @@ func (hp handlerPanicRST) writeFrame(ctx writeContext) error {
return ctx.Framer().WriteRSTStream(hp.StreamID, ErrCodeInternal)
}
func (hp handlerPanicRST) staysWithinBuffer(max int) bool { return frameHeaderLen+4 <= max }
func (se StreamError) writeFrame(ctx writeContext) error {
return ctx.Framer().WriteRSTStream(se.StreamID, se.Code)
}
func (se StreamError) staysWithinBuffer(max int) bool { return frameHeaderLen+4 <= max }
type writePingAck struct{ pf *PingFrame }
func (w writePingAck) writeFrame(ctx writeContext) error {
return ctx.Framer().WritePing(true, w.pf.Data)
}
func (w writePingAck) staysWithinBuffer(max int) bool { return frameHeaderLen+len(w.pf.Data) <= max }
type writeSettingsAck struct{}
func (writeSettingsAck) writeFrame(ctx writeContext) error {
return ctx.Framer().WriteSettingsAck()
}
func (writeSettingsAck) staysWithinBuffer(max int) bool { return frameHeaderLen <= max }
// splitHeaderBlock splits headerBlock into fragments so that each fragment fits
// in a single frame, then calls fn for each fragment. firstFrag/lastFrag are true
// for the first/last fragment, respectively.
func splitHeaderBlock(ctx writeContext, headerBlock []byte, fn func(ctx writeContext, frag []byte, firstFrag, lastFrag bool) error) error {
// For now we're lazy and just pick the minimum MAX_FRAME_SIZE
// that all peers must support (16KB). Later we could care
// more and send larger frames if the peer advertised it, but
// there's little point. Most headers are small anyway (so we
// generally won't have CONTINUATION frames), and extra frames
// only waste 9 bytes anyway.
const maxFrameSize = 16384
first := true
for len(headerBlock) > 0 {
frag := headerBlock
if len(frag) > maxFrameSize {
frag = frag[:maxFrameSize]
}
headerBlock = headerBlock[len(frag):]
if err := fn(ctx, frag, first, len(headerBlock) == 0); err != nil {
return err
}
first = false
}
return nil
}
// writeResHeaders is a request to write a HEADERS and 0+ CONTINUATION frames
// for HTTP response headers or trailers from a server handler.
type writeResHeaders struct {
@ -144,6 +200,17 @@ func encKV(enc *hpack.Encoder, k, v string) {
enc.WriteField(hpack.HeaderField{Name: k, Value: v})
}
func (w *writeResHeaders) staysWithinBuffer(max int) bool {
// TODO: this is a common one. It'd be nice to return true
// here and get into the fast path if we could be clever and
// calculate the size fast enough, or at least a conservative
// uppper bound that usually fires. (Maybe if w.h and
// w.trailers are nil, so we don't need to enumerate it.)
// Otherwise I'm afraid that just calculating the length to
// answer this question would be slower than the ~2µs benefit.
return false
}
func (w *writeResHeaders) writeFrame(ctx writeContext) error {
enc, buf := ctx.HeaderEncoder()
buf.Reset()
@ -169,39 +236,69 @@ func (w *writeResHeaders) writeFrame(ctx writeContext) error {
panic("unexpected empty hpack")
}
// For now we're lazy and just pick the minimum MAX_FRAME_SIZE
// that all peers must support (16KB). Later we could care
// more and send larger frames if the peer advertised it, but
// there's little point. Most headers are small anyway (so we
// generally won't have CONTINUATION frames), and extra frames
// only waste 9 bytes anyway.
const maxFrameSize = 16384
return splitHeaderBlock(ctx, headerBlock, w.writeHeaderBlock)
}
first := true
for len(headerBlock) > 0 {
frag := headerBlock
if len(frag) > maxFrameSize {
frag = frag[:maxFrameSize]
}
headerBlock = headerBlock[len(frag):]
endHeaders := len(headerBlock) == 0
var err error
if first {
first = false
err = ctx.Framer().WriteHeaders(HeadersFrameParam{
StreamID: w.streamID,
BlockFragment: frag,
EndStream: w.endStream,
EndHeaders: endHeaders,
})
} else {
err = ctx.Framer().WriteContinuation(w.streamID, endHeaders, frag)
}
if err != nil {
return err
}
func (w *writeResHeaders) writeHeaderBlock(ctx writeContext, frag []byte, firstFrag, lastFrag bool) error {
if firstFrag {
return ctx.Framer().WriteHeaders(HeadersFrameParam{
StreamID: w.streamID,
BlockFragment: frag,
EndStream: w.endStream,
EndHeaders: lastFrag,
})
} else {
return ctx.Framer().WriteContinuation(w.streamID, lastFrag, frag)
}
}
// writePushPromise is a request to write a PUSH_PROMISE and 0+ CONTINUATION frames.
type writePushPromise struct {
streamID uint32 // pusher stream
method string // for :method
url *url.URL // for :scheme, :authority, :path
h http.Header
// Creates an ID for a pushed stream. This runs on serveG just before
// the frame is written. The returned ID is copied to promisedID.
allocatePromisedID func() (uint32, error)
promisedID uint32
}
func (w *writePushPromise) staysWithinBuffer(max int) bool {
// TODO: see writeResHeaders.staysWithinBuffer
return false
}
func (w *writePushPromise) writeFrame(ctx writeContext) error {
enc, buf := ctx.HeaderEncoder()
buf.Reset()
encKV(enc, ":method", w.method)
encKV(enc, ":scheme", w.url.Scheme)
encKV(enc, ":authority", w.url.Host)
encKV(enc, ":path", w.url.RequestURI())
encodeHeaders(enc, w.h, nil)
headerBlock := buf.Bytes()
if len(headerBlock) == 0 {
panic("unexpected empty hpack")
}
return splitHeaderBlock(ctx, headerBlock, w.writeHeaderBlock)
}
func (w *writePushPromise) writeHeaderBlock(ctx writeContext, frag []byte, firstFrag, lastFrag bool) error {
if firstFrag {
return ctx.Framer().WritePushPromise(PushPromiseParam{
StreamID: w.streamID,
PromiseID: w.promisedID,
BlockFragment: frag,
EndHeaders: lastFrag,
})
} else {
return ctx.Framer().WriteContinuation(w.streamID, lastFrag, frag)
}
return nil
}
type write100ContinueHeadersFrame struct {
@ -220,35 +317,45 @@ func (w write100ContinueHeadersFrame) writeFrame(ctx writeContext) error {
})
}
func (w write100ContinueHeadersFrame) staysWithinBuffer(max int) bool {
// Sloppy but conservative:
return 9+2*(len(":status")+len("100")) <= max
}
type writeWindowUpdate struct {
streamID uint32 // or 0 for conn-level
n uint32
}
func (wu writeWindowUpdate) staysWithinBuffer(max int) bool { return frameHeaderLen+4 <= max }
func (wu writeWindowUpdate) writeFrame(ctx writeContext) error {
return ctx.Framer().WriteWindowUpdate(wu.streamID, wu.n)
}
// encodeHeaders encodes an http.Header. If keys is not nil, then (k, h[k])
// is encoded only only if k is in keys.
func encodeHeaders(enc *hpack.Encoder, h http.Header, keys []string) {
// TODO: garbage. pool sorters like http1? hot path for 1 key?
if keys == nil {
keys = make([]string, 0, len(h))
for k := range h {
keys = append(keys, k)
}
sort.Strings(keys)
sorter := sorterPool.Get().(*sorter)
// Using defer here, since the returned keys from the
// sorter.Keys method is only valid until the sorter
// is returned:
defer sorterPool.Put(sorter)
keys = sorter.Keys(h)
}
for _, k := range keys {
vv := h[k]
k = lowerHeader(k)
if !validHeaderFieldName(k) {
// TODO: return an error? golang.org/issue/14048
// For now just omit it.
if !validWireHeaderFieldName(k) {
// Skip it as backup paranoia. Per
// golang.org/issue/14048, these should
// already be rejected at a higher level.
continue
}
isTE := k == "transfer-encoding"
for _, v := range vv {
if !validHeaderFieldValue(v) {
if !httplex.ValidHeaderFieldValue(v) {
// TODO: return an error? golang.org/issue/14048
// For now just omit it.
continue

View File

@ -6,14 +6,53 @@ package http2
import "fmt"
// frameWriteMsg is a request to write a frame.
type frameWriteMsg struct {
// WriteScheduler is the interface implemented by HTTP/2 write schedulers.
// Methods are never called concurrently.
type WriteScheduler interface {
// OpenStream opens a new stream in the write scheduler.
// It is illegal to call this with streamID=0 or with a streamID that is
// already open -- the call may panic.
OpenStream(streamID uint32, options OpenStreamOptions)
// CloseStream closes a stream in the write scheduler. Any frames queued on
// this stream should be discarded. It is illegal to call this on a stream
// that is not open -- the call may panic.
CloseStream(streamID uint32)
// AdjustStream adjusts the priority of the given stream. This may be called
// on a stream that has not yet been opened or has been closed. Note that
// RFC 7540 allows PRIORITY frames to be sent on streams in any state. See:
// https://tools.ietf.org/html/rfc7540#section-5.1
AdjustStream(streamID uint32, priority PriorityParam)
// Push queues a frame in the scheduler. In most cases, this will not be
// called with wr.StreamID()!=0 unless that stream is currently open. The one
// exception is RST_STREAM frames, which may be sent on idle or closed streams.
Push(wr FrameWriteRequest)
// Pop dequeues the next frame to write. Returns false if no frames can
// be written. Frames with a given wr.StreamID() are Pop'd in the same
// order they are Push'd.
Pop() (wr FrameWriteRequest, ok bool)
}
// OpenStreamOptions specifies extra options for WriteScheduler.OpenStream.
type OpenStreamOptions struct {
// PusherID is zero if the stream was initiated by the client. Otherwise,
// PusherID names the stream that pushed the newly opened stream.
PusherID uint32
}
// FrameWriteRequest is a request to write a frame.
type FrameWriteRequest struct {
// write is the interface value that does the writing, once the
// writeScheduler (below) has decided to select this frame
// to write. The write functions are all defined in write.go.
// WriteScheduler has selected this frame to write. The write
// functions are all defined in write.go.
write writeFramer
stream *stream // used for prioritization. nil for non-stream frames.
// stream is the stream on which this frame will be written.
// nil for non-stream frames like PING and SETTINGS.
stream *stream
// done, if non-nil, must be a buffered channel with space for
// 1 message and is sent the return value from write (or an
@ -21,263 +60,183 @@ type frameWriteMsg struct {
done chan error
}
// for debugging only:
func (wm frameWriteMsg) String() string {
var streamID uint32
if wm.stream != nil {
streamID = wm.stream.id
}
var des string
if s, ok := wm.write.(fmt.Stringer); ok {
des = s.String()
} else {
des = fmt.Sprintf("%T", wm.write)
}
return fmt.Sprintf("[frameWriteMsg stream=%d, ch=%v, type: %v]", streamID, wm.done != nil, des)
}
// writeScheduler tracks pending frames to write, priorities, and decides
// the next one to use. It is not thread-safe.
type writeScheduler struct {
// zero are frames not associated with a specific stream.
// They're sent before any stream-specific freams.
zero writeQueue
// maxFrameSize is the maximum size of a DATA frame
// we'll write. Must be non-zero and between 16K-16M.
maxFrameSize uint32
// sq contains the stream-specific queues, keyed by stream ID.
// when a stream is idle, it's deleted from the map.
sq map[uint32]*writeQueue
// canSend is a slice of memory that's reused between frame
// scheduling decisions to hold the list of writeQueues (from sq)
// which have enough flow control data to send. After canSend is
// built, the best is selected.
canSend []*writeQueue
// pool of empty queues for reuse.
queuePool []*writeQueue
}
func (ws *writeScheduler) putEmptyQueue(q *writeQueue) {
if len(q.s) != 0 {
panic("queue must be empty")
}
ws.queuePool = append(ws.queuePool, q)
}
func (ws *writeScheduler) getEmptyQueue() *writeQueue {
ln := len(ws.queuePool)
if ln == 0 {
return new(writeQueue)
}
q := ws.queuePool[ln-1]
ws.queuePool = ws.queuePool[:ln-1]
return q
}
func (ws *writeScheduler) empty() bool { return ws.zero.empty() && len(ws.sq) == 0 }
func (ws *writeScheduler) add(wm frameWriteMsg) {
st := wm.stream
if st == nil {
ws.zero.push(wm)
} else {
ws.streamQueue(st.id).push(wm)
}
}
func (ws *writeScheduler) streamQueue(streamID uint32) *writeQueue {
if q, ok := ws.sq[streamID]; ok {
return q
}
if ws.sq == nil {
ws.sq = make(map[uint32]*writeQueue)
}
q := ws.getEmptyQueue()
ws.sq[streamID] = q
return q
}
// take returns the most important frame to write and removes it from the scheduler.
// It is illegal to call this if the scheduler is empty or if there are no connection-level
// flow control bytes available.
func (ws *writeScheduler) take() (wm frameWriteMsg, ok bool) {
if ws.maxFrameSize == 0 {
panic("internal error: ws.maxFrameSize not initialized or invalid")
}
// If there any frames not associated with streams, prefer those first.
// These are usually SETTINGS, etc.
if !ws.zero.empty() {
return ws.zero.shift(), true
}
if len(ws.sq) == 0 {
return
}
// Next, prioritize frames on streams that aren't DATA frames (no cost).
for id, q := range ws.sq {
if q.firstIsNoCost() {
return ws.takeFrom(id, q)
// StreamID returns the id of the stream this frame will be written to.
// 0 is used for non-stream frames such as PING and SETTINGS.
func (wr FrameWriteRequest) StreamID() uint32 {
if wr.stream == nil {
if se, ok := wr.write.(StreamError); ok {
// (*serverConn).resetStream doesn't set
// stream because it doesn't necessarily have
// one. So special case this type of write
// message.
return se.StreamID
}
}
// Now, all that remains are DATA frames with non-zero bytes to
// send. So pick the best one.
if len(ws.canSend) != 0 {
panic("should be empty")
}
for _, q := range ws.sq {
if n := ws.streamWritableBytes(q); n > 0 {
ws.canSend = append(ws.canSend, q)
}
}
if len(ws.canSend) == 0 {
return
}
defer ws.zeroCanSend()
// TODO: find the best queue
q := ws.canSend[0]
return ws.takeFrom(q.streamID(), q)
}
// zeroCanSend is defered from take.
func (ws *writeScheduler) zeroCanSend() {
for i := range ws.canSend {
ws.canSend[i] = nil
}
ws.canSend = ws.canSend[:0]
}
// streamWritableBytes returns the number of DATA bytes we could write
// from the given queue's stream, if this stream/queue were
// selected. It is an error to call this if q's head isn't a
// *writeData.
func (ws *writeScheduler) streamWritableBytes(q *writeQueue) int32 {
wm := q.head()
ret := wm.stream.flow.available() // max we can write
if ret == 0 {
return 0
}
if int32(ws.maxFrameSize) < ret {
ret = int32(ws.maxFrameSize)
}
if ret == 0 {
panic("internal error: ws.maxFrameSize not initialized or invalid")
}
wd := wm.write.(*writeData)
if len(wd.p) < int(ret) {
ret = int32(len(wd.p))
}
return ret
return wr.stream.id
}
func (ws *writeScheduler) takeFrom(id uint32, q *writeQueue) (wm frameWriteMsg, ok bool) {
wm = q.head()
// If the first item in this queue costs flow control tokens
// and we don't have enough, write as much as we can.
if wd, ok := wm.write.(*writeData); ok && len(wd.p) > 0 {
allowed := wm.stream.flow.available() // max we can write
if allowed == 0 {
// No quota available. Caller can try the next stream.
return frameWriteMsg{}, false
}
if int32(ws.maxFrameSize) < allowed {
allowed = int32(ws.maxFrameSize)
}
// TODO: further restrict the allowed size, because even if
// the peer says it's okay to write 16MB data frames, we might
// want to write smaller ones to properly weight competing
// streams' priorities.
if len(wd.p) > int(allowed) {
wm.stream.flow.take(allowed)
chunk := wd.p[:allowed]
wd.p = wd.p[allowed:]
// Make up a new write message of a valid size, rather
// than shifting one off the queue.
return frameWriteMsg{
stream: wm.stream,
write: &writeData{
streamID: wd.streamID,
p: chunk,
// even if the original had endStream set, there
// arebytes remaining because len(wd.p) > allowed,
// so we know endStream is false:
endStream: false,
},
// our caller is blocking on the final DATA frame, not
// these intermediates, so no need to wait:
done: nil,
}, true
}
wm.stream.flow.take(int32(len(wd.p)))
// DataSize returns the number of flow control bytes that must be consumed
// to write this entire frame. This is 0 for non-DATA frames.
func (wr FrameWriteRequest) DataSize() int {
if wd, ok := wr.write.(*writeData); ok {
return len(wd.p)
}
q.shift()
if q.empty() {
ws.putEmptyQueue(q)
delete(ws.sq, id)
}
return wm, true
return 0
}
func (ws *writeScheduler) forgetStream(id uint32) {
q, ok := ws.sq[id]
if !ok {
// Consume consumes min(n, available) bytes from this frame, where available
// is the number of flow control bytes available on the stream. Consume returns
// 0, 1, or 2 frames, where the integer return value gives the number of frames
// returned.
//
// If flow control prevents consuming any bytes, this returns (_, _, 0). If
// the entire frame was consumed, this returns (wr, _, 1). Otherwise, this
// returns (consumed, rest, 2), where 'consumed' contains the consumed bytes and
// 'rest' contains the remaining bytes. The consumed bytes are deducted from the
// underlying stream's flow control budget.
func (wr FrameWriteRequest) Consume(n int32) (FrameWriteRequest, FrameWriteRequest, int) {
var empty FrameWriteRequest
// Non-DATA frames are always consumed whole.
wd, ok := wr.write.(*writeData)
if !ok || len(wd.p) == 0 {
return wr, empty, 1
}
// Might need to split after applying limits.
allowed := wr.stream.flow.available()
if n < allowed {
allowed = n
}
if wr.stream.sc.maxFrameSize < allowed {
allowed = wr.stream.sc.maxFrameSize
}
if allowed <= 0 {
return empty, empty, 0
}
if len(wd.p) > int(allowed) {
wr.stream.flow.take(allowed)
consumed := FrameWriteRequest{
stream: wr.stream,
write: &writeData{
streamID: wd.streamID,
p: wd.p[:allowed],
// Even if the original had endStream set, there
// are bytes remaining because len(wd.p) > allowed,
// so we know endStream is false.
endStream: false,
},
// Our caller is blocking on the final DATA frame, not
// this intermediate frame, so no need to wait.
done: nil,
}
rest := FrameWriteRequest{
stream: wr.stream,
write: &writeData{
streamID: wd.streamID,
p: wd.p[allowed:],
endStream: wd.endStream,
},
done: wr.done,
}
return consumed, rest, 2
}
// The frame is consumed whole.
// NB: This cast cannot overflow because allowed is <= math.MaxInt32.
wr.stream.flow.take(int32(len(wd.p)))
return wr, empty, 1
}
// String is for debugging only.
func (wr FrameWriteRequest) String() string {
var des string
if s, ok := wr.write.(fmt.Stringer); ok {
des = s.String()
} else {
des = fmt.Sprintf("%T", wr.write)
}
return fmt.Sprintf("[FrameWriteRequest stream=%d, ch=%v, writer=%v]", wr.StreamID(), wr.done != nil, des)
}
// replyToWriter sends err to wr.done and panics if the send must block
// This does nothing if wr.done is nil.
func (wr *FrameWriteRequest) replyToWriter(err error) {
if wr.done == nil {
return
}
delete(ws.sq, id)
// But keep it for others later.
for i := range q.s {
q.s[i] = frameWriteMsg{}
select {
case wr.done <- err:
default:
panic(fmt.Sprintf("unbuffered done channel passed in for type %T", wr.write))
}
q.s = q.s[:0]
ws.putEmptyQueue(q)
wr.write = nil // prevent use (assume it's tainted after wr.done send)
}
// writeQueue is used by implementations of WriteScheduler.
type writeQueue struct {
s []frameWriteMsg
s []FrameWriteRequest
}
// streamID returns the stream ID for a non-empty stream-specific queue.
func (q *writeQueue) streamID() uint32 { return q.s[0].stream.id }
func (q *writeQueue) empty() bool { return len(q.s) == 0 }
func (q *writeQueue) push(wm frameWriteMsg) {
q.s = append(q.s, wm)
func (q *writeQueue) push(wr FrameWriteRequest) {
q.s = append(q.s, wr)
}
// head returns the next item that would be removed by shift.
func (q *writeQueue) head() frameWriteMsg {
func (q *writeQueue) shift() FrameWriteRequest {
if len(q.s) == 0 {
panic("invalid use of queue")
}
return q.s[0]
}
func (q *writeQueue) shift() frameWriteMsg {
if len(q.s) == 0 {
panic("invalid use of queue")
}
wm := q.s[0]
wr := q.s[0]
// TODO: less copy-happy queue.
copy(q.s, q.s[1:])
q.s[len(q.s)-1] = frameWriteMsg{}
q.s[len(q.s)-1] = FrameWriteRequest{}
q.s = q.s[:len(q.s)-1]
return wm
return wr
}
func (q *writeQueue) firstIsNoCost() bool {
if df, ok := q.s[0].write.(*writeData); ok {
return len(df.p) == 0
// consume consumes up to n bytes from q.s[0]. If the frame is
// entirely consumed, it is removed from the queue. If the frame
// is partially consumed, the frame is kept with the consumed
// bytes removed. Returns true iff any bytes were consumed.
func (q *writeQueue) consume(n int32) (FrameWriteRequest, bool) {
if len(q.s) == 0 {
return FrameWriteRequest{}, false
}
return true
consumed, rest, numresult := q.s[0].Consume(n)
switch numresult {
case 0:
return FrameWriteRequest{}, false
case 1:
q.shift()
case 2:
q.s[0] = rest
}
return consumed, true
}
type writeQueuePool []*writeQueue
// put inserts an unused writeQueue into the pool.
func (p *writeQueuePool) put(q *writeQueue) {
for i := range q.s {
q.s[i] = FrameWriteRequest{}
}
q.s = q.s[:0]
*p = append(*p, q)
}
// get returns an empty writeQueue.
func (p *writeQueuePool) get() *writeQueue {
ln := len(*p)
if ln == 0 {
return new(writeQueue)
}
x := ln - 1
q := (*p)[x]
(*p)[x] = nil
*p = (*p)[:x]
return q
}

View File

@ -0,0 +1,452 @@
// Copyright 2016 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
package http2
import (
"fmt"
"math"
"sort"
)
// RFC 7540, Section 5.3.5: the default weight is 16.
const priorityDefaultWeight = 15 // 16 = 15 + 1
// PriorityWriteSchedulerConfig configures a priorityWriteScheduler.
type PriorityWriteSchedulerConfig struct {
// MaxClosedNodesInTree controls the maximum number of closed streams to
// retain in the priority tree. Setting this to zero saves a small amount
// of memory at the cost of performance.
//
// See RFC 7540, Section 5.3.4:
// "It is possible for a stream to become closed while prioritization
// information ... is in transit. ... This potentially creates suboptimal
// prioritization, since the stream could be given a priority that is
// different from what is intended. To avoid these problems, an endpoint
// SHOULD retain stream prioritization state for a period after streams
// become closed. The longer state is retained, the lower the chance that
// streams are assigned incorrect or default priority values."
MaxClosedNodesInTree int
// MaxIdleNodesInTree controls the maximum number of idle streams to
// retain in the priority tree. Setting this to zero saves a small amount
// of memory at the cost of performance.
//
// See RFC 7540, Section 5.3.4:
// Similarly, streams that are in the "idle" state can be assigned
// priority or become a parent of other streams. This allows for the
// creation of a grouping node in the dependency tree, which enables
// more flexible expressions of priority. Idle streams begin with a
// default priority (Section 5.3.5).
MaxIdleNodesInTree int
// ThrottleOutOfOrderWrites enables write throttling to help ensure that
// data is delivered in priority order. This works around a race where
// stream B depends on stream A and both streams are about to call Write
// to queue DATA frames. If B wins the race, a naive scheduler would eagerly
// write as much data from B as possible, but this is suboptimal because A
// is a higher-priority stream. With throttling enabled, we write a small
// amount of data from B to minimize the amount of bandwidth that B can
// steal from A.
ThrottleOutOfOrderWrites bool
}
// NewPriorityWriteScheduler constructs a WriteScheduler that schedules
// frames by following HTTP/2 priorities as described in RFC 7340 Section 5.3.
// If cfg is nil, default options are used.
func NewPriorityWriteScheduler(cfg *PriorityWriteSchedulerConfig) WriteScheduler {
if cfg == nil {
// For justification of these defaults, see:
// https://docs.google.com/document/d/1oLhNg1skaWD4_DtaoCxdSRN5erEXrH-KnLrMwEpOtFY
cfg = &PriorityWriteSchedulerConfig{
MaxClosedNodesInTree: 10,
MaxIdleNodesInTree: 10,
ThrottleOutOfOrderWrites: false,
}
}
ws := &priorityWriteScheduler{
nodes: make(map[uint32]*priorityNode),
maxClosedNodesInTree: cfg.MaxClosedNodesInTree,
maxIdleNodesInTree: cfg.MaxIdleNodesInTree,
enableWriteThrottle: cfg.ThrottleOutOfOrderWrites,
}
ws.nodes[0] = &ws.root
if cfg.ThrottleOutOfOrderWrites {
ws.writeThrottleLimit = 1024
} else {
ws.writeThrottleLimit = math.MaxInt32
}
return ws
}
type priorityNodeState int
const (
priorityNodeOpen priorityNodeState = iota
priorityNodeClosed
priorityNodeIdle
)
// priorityNode is a node in an HTTP/2 priority tree.
// Each node is associated with a single stream ID.
// See RFC 7540, Section 5.3.
type priorityNode struct {
q writeQueue // queue of pending frames to write
id uint32 // id of the stream, or 0 for the root of the tree
weight uint8 // the actual weight is weight+1, so the value is in [1,256]
state priorityNodeState // open | closed | idle
bytes int64 // number of bytes written by this node, or 0 if closed
subtreeBytes int64 // sum(node.bytes) of all nodes in this subtree
// These links form the priority tree.
parent *priorityNode
kids *priorityNode // start of the kids list
prev, next *priorityNode // doubly-linked list of siblings
}
func (n *priorityNode) setParent(parent *priorityNode) {
if n == parent {
panic("setParent to self")
}
if n.parent == parent {
return
}
// Unlink from current parent.
if parent := n.parent; parent != nil {
if n.prev == nil {
parent.kids = n.next
} else {
n.prev.next = n.next
}
if n.next != nil {
n.next.prev = n.prev
}
}
// Link to new parent.
// If parent=nil, remove n from the tree.
// Always insert at the head of parent.kids (this is assumed by walkReadyInOrder).
n.parent = parent
if parent == nil {
n.next = nil
n.prev = nil
} else {
n.next = parent.kids
n.prev = nil
if n.next != nil {
n.next.prev = n
}
parent.kids = n
}
}
func (n *priorityNode) addBytes(b int64) {
n.bytes += b
for ; n != nil; n = n.parent {
n.subtreeBytes += b
}
}
// walkReadyInOrder iterates over the tree in priority order, calling f for each node
// with a non-empty write queue. When f returns true, this funcion returns true and the
// walk halts. tmp is used as scratch space for sorting.
//
// f(n, openParent) takes two arguments: the node to visit, n, and a bool that is true
// if any ancestor p of n is still open (ignoring the root node).
func (n *priorityNode) walkReadyInOrder(openParent bool, tmp *[]*priorityNode, f func(*priorityNode, bool) bool) bool {
if !n.q.empty() && f(n, openParent) {
return true
}
if n.kids == nil {
return false
}
// Don't consider the root "open" when updating openParent since
// we can't send data frames on the root stream (only control frames).
if n.id != 0 {
openParent = openParent || (n.state == priorityNodeOpen)
}
// Common case: only one kid or all kids have the same weight.
// Some clients don't use weights; other clients (like web browsers)
// use mostly-linear priority trees.
w := n.kids.weight
needSort := false
for k := n.kids.next; k != nil; k = k.next {
if k.weight != w {
needSort = true
break
}
}
if !needSort {
for k := n.kids; k != nil; k = k.next {
if k.walkReadyInOrder(openParent, tmp, f) {
return true
}
}
return false
}
// Uncommon case: sort the child nodes. We remove the kids from the parent,
// then re-insert after sorting so we can reuse tmp for future sort calls.
*tmp = (*tmp)[:0]
for n.kids != nil {
*tmp = append(*tmp, n.kids)
n.kids.setParent(nil)
}
sort.Sort(sortPriorityNodeSiblings(*tmp))
for i := len(*tmp) - 1; i >= 0; i-- {
(*tmp)[i].setParent(n) // setParent inserts at the head of n.kids
}
for k := n.kids; k != nil; k = k.next {
if k.walkReadyInOrder(openParent, tmp, f) {
return true
}
}
return false
}
type sortPriorityNodeSiblings []*priorityNode
func (z sortPriorityNodeSiblings) Len() int { return len(z) }
func (z sortPriorityNodeSiblings) Swap(i, k int) { z[i], z[k] = z[k], z[i] }
func (z sortPriorityNodeSiblings) Less(i, k int) bool {
// Prefer the subtree that has sent fewer bytes relative to its weight.
// See sections 5.3.2 and 5.3.4.
wi, bi := float64(z[i].weight+1), float64(z[i].subtreeBytes)
wk, bk := float64(z[k].weight+1), float64(z[k].subtreeBytes)
if bi == 0 && bk == 0 {
return wi >= wk
}
if bk == 0 {
return false
}
return bi/bk <= wi/wk
}
type priorityWriteScheduler struct {
// root is the root of the priority tree, where root.id = 0.
// The root queues control frames that are not associated with any stream.
root priorityNode
// nodes maps stream ids to priority tree nodes.
nodes map[uint32]*priorityNode
// maxID is the maximum stream id in nodes.
maxID uint32
// lists of nodes that have been closed or are idle, but are kept in
// the tree for improved prioritization. When the lengths exceed either
// maxClosedNodesInTree or maxIdleNodesInTree, old nodes are discarded.
closedNodes, idleNodes []*priorityNode
// From the config.
maxClosedNodesInTree int
maxIdleNodesInTree int
writeThrottleLimit int32
enableWriteThrottle bool
// tmp is scratch space for priorityNode.walkReadyInOrder to reduce allocations.
tmp []*priorityNode
// pool of empty queues for reuse.
queuePool writeQueuePool
}
func (ws *priorityWriteScheduler) OpenStream(streamID uint32, options OpenStreamOptions) {
// The stream may be currently idle but cannot be opened or closed.
if curr := ws.nodes[streamID]; curr != nil {
if curr.state != priorityNodeIdle {
panic(fmt.Sprintf("stream %d already opened", streamID))
}
curr.state = priorityNodeOpen
return
}
// RFC 7540, Section 5.3.5:
// "All streams are initially assigned a non-exclusive dependency on stream 0x0.
// Pushed streams initially depend on their associated stream. In both cases,
// streams are assigned a default weight of 16."
parent := ws.nodes[options.PusherID]
if parent == nil {
parent = &ws.root
}
n := &priorityNode{
q: *ws.queuePool.get(),
id: streamID,
weight: priorityDefaultWeight,
state: priorityNodeOpen,
}
n.setParent(parent)
ws.nodes[streamID] = n
if streamID > ws.maxID {
ws.maxID = streamID
}
}
func (ws *priorityWriteScheduler) CloseStream(streamID uint32) {
if streamID == 0 {
panic("violation of WriteScheduler interface: cannot close stream 0")
}
if ws.nodes[streamID] == nil {
panic(fmt.Sprintf("violation of WriteScheduler interface: unknown stream %d", streamID))
}
if ws.nodes[streamID].state != priorityNodeOpen {
panic(fmt.Sprintf("violation of WriteScheduler interface: stream %d already closed", streamID))
}
n := ws.nodes[streamID]
n.state = priorityNodeClosed
n.addBytes(-n.bytes)
q := n.q
ws.queuePool.put(&q)
n.q.s = nil
if ws.maxClosedNodesInTree > 0 {
ws.addClosedOrIdleNode(&ws.closedNodes, ws.maxClosedNodesInTree, n)
} else {
ws.removeNode(n)
}
}
func (ws *priorityWriteScheduler) AdjustStream(streamID uint32, priority PriorityParam) {
if streamID == 0 {
panic("adjustPriority on root")
}
// If streamID does not exist, there are two cases:
// - A closed stream that has been removed (this will have ID <= maxID)
// - An idle stream that is being used for "grouping" (this will have ID > maxID)
n := ws.nodes[streamID]
if n == nil {
if streamID <= ws.maxID || ws.maxIdleNodesInTree == 0 {
return
}
ws.maxID = streamID
n = &priorityNode{
q: *ws.queuePool.get(),
id: streamID,
weight: priorityDefaultWeight,
state: priorityNodeIdle,
}
n.setParent(&ws.root)
ws.nodes[streamID] = n
ws.addClosedOrIdleNode(&ws.idleNodes, ws.maxIdleNodesInTree, n)
}
// Section 5.3.1: A dependency on a stream that is not currently in the tree
// results in that stream being given a default priority (Section 5.3.5).
parent := ws.nodes[priority.StreamDep]
if parent == nil {
n.setParent(&ws.root)
n.weight = priorityDefaultWeight
return
}
// Ignore if the client tries to make a node its own parent.
if n == parent {
return
}
// Section 5.3.3:
// "If a stream is made dependent on one of its own dependencies, the
// formerly dependent stream is first moved to be dependent on the
// reprioritized stream's previous parent. The moved dependency retains
// its weight."
//
// That is: if parent depends on n, move parent to depend on n.parent.
for x := parent.parent; x != nil; x = x.parent {
if x == n {
parent.setParent(n.parent)
break
}
}
// Section 5.3.3: The exclusive flag causes the stream to become the sole
// dependency of its parent stream, causing other dependencies to become
// dependent on the exclusive stream.
if priority.Exclusive {
k := parent.kids
for k != nil {
next := k.next
if k != n {
k.setParent(n)
}
k = next
}
}
n.setParent(parent)
n.weight = priority.Weight
}
func (ws *priorityWriteScheduler) Push(wr FrameWriteRequest) {
var n *priorityNode
if id := wr.StreamID(); id == 0 {
n = &ws.root
} else {
n = ws.nodes[id]
if n == nil {
// id is an idle or closed stream. wr should not be a HEADERS or
// DATA frame. However, wr can be a RST_STREAM. In this case, we
// push wr onto the root, rather than creating a new priorityNode,
// since RST_STREAM is tiny and the stream's priority is unknown
// anyway. See issue #17919.
if wr.DataSize() > 0 {
panic("add DATA on non-open stream")
}
n = &ws.root
}
}
n.q.push(wr)
}
func (ws *priorityWriteScheduler) Pop() (wr FrameWriteRequest, ok bool) {
ws.root.walkReadyInOrder(false, &ws.tmp, func(n *priorityNode, openParent bool) bool {
limit := int32(math.MaxInt32)
if openParent {
limit = ws.writeThrottleLimit
}
wr, ok = n.q.consume(limit)
if !ok {
return false
}
n.addBytes(int64(wr.DataSize()))
// If B depends on A and B continuously has data available but A
// does not, gradually increase the throttling limit to allow B to
// steal more and more bandwidth from A.
if openParent {
ws.writeThrottleLimit += 1024
if ws.writeThrottleLimit < 0 {
ws.writeThrottleLimit = math.MaxInt32
}
} else if ws.enableWriteThrottle {
ws.writeThrottleLimit = 1024
}
return true
})
return wr, ok
}
func (ws *priorityWriteScheduler) addClosedOrIdleNode(list *[]*priorityNode, maxSize int, n *priorityNode) {
if maxSize == 0 {
return
}
if len(*list) == maxSize {
// Remove the oldest node, then shift left.
ws.removeNode((*list)[0])
x := (*list)[1:]
copy(*list, x)
*list = (*list)[:len(x)]
}
*list = append(*list, n)
}
func (ws *priorityWriteScheduler) removeNode(n *priorityNode) {
for k := n.kids; k != nil; k = k.next {
k.setParent(n.parent)
}
n.setParent(nil)
delete(ws.nodes, n.id)
}

72
cmd/vendor/golang.org/x/net/http2/writesched_random.go generated vendored Normal file
View File

@ -0,0 +1,72 @@
// Copyright 2014 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
package http2
import "math"
// NewRandomWriteScheduler constructs a WriteScheduler that ignores HTTP/2
// priorities. Control frames like SETTINGS and PING are written before DATA
// frames, but if no control frames are queued and multiple streams have queued
// HEADERS or DATA frames, Pop selects a ready stream arbitrarily.
func NewRandomWriteScheduler() WriteScheduler {
return &randomWriteScheduler{sq: make(map[uint32]*writeQueue)}
}
type randomWriteScheduler struct {
// zero are frames not associated with a specific stream.
zero writeQueue
// sq contains the stream-specific queues, keyed by stream ID.
// When a stream is idle or closed, it's deleted from the map.
sq map[uint32]*writeQueue
// pool of empty queues for reuse.
queuePool writeQueuePool
}
func (ws *randomWriteScheduler) OpenStream(streamID uint32, options OpenStreamOptions) {
// no-op: idle streams are not tracked
}
func (ws *randomWriteScheduler) CloseStream(streamID uint32) {
q, ok := ws.sq[streamID]
if !ok {
return
}
delete(ws.sq, streamID)
ws.queuePool.put(q)
}
func (ws *randomWriteScheduler) AdjustStream(streamID uint32, priority PriorityParam) {
// no-op: priorities are ignored
}
func (ws *randomWriteScheduler) Push(wr FrameWriteRequest) {
id := wr.StreamID()
if id == 0 {
ws.zero.push(wr)
return
}
q, ok := ws.sq[id]
if !ok {
q = ws.queuePool.get()
ws.sq[id] = q
}
q.push(wr)
}
func (ws *randomWriteScheduler) Pop() (FrameWriteRequest, bool) {
// Control frames first.
if !ws.zero.empty() {
return ws.zero.shift(), true
}
// Iterate over all non-idle streams until finding one that can be consumed.
for _, q := range ws.sq {
if wr, ok := q.consume(math.MaxInt32); ok {
return wr, true
}
}
return FrameWriteRequest{}, false
}

68
cmd/vendor/golang.org/x/net/idna/idna.go generated vendored Normal file
View File

@ -0,0 +1,68 @@
// Copyright 2012 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
// Package idna implements IDNA2008 (Internationalized Domain Names for
// Applications), defined in RFC 5890, RFC 5891, RFC 5892, RFC 5893 and
// RFC 5894.
package idna // import "golang.org/x/net/idna"
import (
"strings"
"unicode/utf8"
)
// TODO(nigeltao): specify when errors occur. For example, is ToASCII(".") or
// ToASCII("foo\x00") an error? See also http://www.unicode.org/faq/idn.html#11
// acePrefix is the ASCII Compatible Encoding prefix.
const acePrefix = "xn--"
// ToASCII converts a domain or domain label to its ASCII form. For example,
// ToASCII("bücher.example.com") is "xn--bcher-kva.example.com", and
// ToASCII("golang") is "golang".
func ToASCII(s string) (string, error) {
if ascii(s) {
return s, nil
}
labels := strings.Split(s, ".")
for i, label := range labels {
if !ascii(label) {
a, err := encode(acePrefix, label)
if err != nil {
return "", err
}
labels[i] = a
}
}
return strings.Join(labels, "."), nil
}
// ToUnicode converts a domain or domain label to its Unicode form. For example,
// ToUnicode("xn--bcher-kva.example.com") is "bücher.example.com", and
// ToUnicode("golang") is "golang".
func ToUnicode(s string) (string, error) {
if !strings.Contains(s, acePrefix) {
return s, nil
}
labels := strings.Split(s, ".")
for i, label := range labels {
if strings.HasPrefix(label, acePrefix) {
u, err := decode(label[len(acePrefix):])
if err != nil {
return "", err
}
labels[i] = u
}
}
return strings.Join(labels, "."), nil
}
func ascii(s string) bool {
for i := 0; i < len(s); i++ {
if s[i] >= utf8.RuneSelf {
return false
}
}
return true
}

Some files were not shown because too many files have changed in this diff Show More