Compare commits
39 Commits
Author | SHA1 | Date | |
---|---|---|---|
05b564a394 | |||
cb779b2305 | |||
22c3208fb3 | |||
e44372e430 | |||
05a90bc1e5 | |||
6751727809 | |||
916106c3a2 | |||
e0c7768f94 | |||
0fb2d5d4d3 | |||
fc61fc7c7a | |||
09b81bad15 | |||
b4bddf685b | |||
af1c711270 | |||
c269426be8 | |||
20b7df3c12 | |||
e342de3cc5 | |||
26cc2111cd | |||
5d6457e658 | |||
53bc644168 | |||
ad3bb484ca | |||
15f7b736e4 | |||
4dc835c718 | |||
75f8282eef | |||
45c86af0eb | |||
71e5467807 | |||
0169fec873 | |||
766023b1b0 | |||
ca9e63dde2 | |||
7659bbb1b2 | |||
f8b98d3925 | |||
9ee3ed777b | |||
c9bd125490 | |||
ec49496111 | |||
baaefd18e2 | |||
72c18eb7ba | |||
2e87d71bc6 | |||
217dccd617 | |||
3ceb5dd270 | |||
49b77a59cf |
1
.gitignore
vendored
1
.gitignore
vendored
@ -9,4 +9,3 @@
|
||||
*.swp
|
||||
/hack/insta-discovery/.env
|
||||
*.test
|
||||
tools/functional-tester/docker/bin
|
||||
|
2
.header
2
.header
@ -1,4 +1,4 @@
|
||||
// Copyright 2016 CoreOS, Inc.
|
||||
// Copyright 2014 CoreOS, Inc.
|
||||
//
|
||||
// Licensed under the Apache License, Version 2.0 (the "License");
|
||||
// you may not use this file except in compliance with the License.
|
||||
|
20
.travis.yml
20
.travis.yml
@ -1,25 +1,11 @@
|
||||
language: go
|
||||
sudo: false
|
||||
|
||||
go:
|
||||
- 1.4
|
||||
- 1.5
|
||||
- 1.6
|
||||
- tip
|
||||
|
||||
matrix:
|
||||
allow_failures:
|
||||
- go: tip
|
||||
|
||||
addons:
|
||||
apt:
|
||||
packages:
|
||||
- libpcap-dev
|
||||
- libaspell-dev
|
||||
- libhunspell-dev
|
||||
|
||||
before_install:
|
||||
- go get -v github.com/chzchzchz/goword
|
||||
install:
|
||||
- go get github.com/barakmich/go-nyet
|
||||
|
||||
script:
|
||||
- ./test
|
||||
- INTEGRATION=y ./test
|
||||
|
@ -35,7 +35,7 @@ Thanks for your contributions!
|
||||
|
||||
### Code style
|
||||
|
||||
The coding style suggested by the Golang community is used in etcd. See the [style doc](https://github.com/golang/go/wiki/CodeReviewComments) for details.
|
||||
The coding style suggested by the Golang community is used in etcd. See the [style doc](https://code.google.com/p/go-wiki/wiki/CodeReviewComments) for details.
|
||||
|
||||
Please follow this style to make etcd easy to review, maintain and develop.
|
||||
|
||||
|
@ -1,4 +1,4 @@
|
||||
# Snapshot Migration
|
||||
## Snapshot Migration
|
||||
|
||||
You can migrate a snapshot of your data from a v0.4.9+ cluster into a new etcd 2.2 cluster using a snapshot migration. After snapshot migration, the etcd indexes of your data will change. Many etcd applications rely on these indexes to behave correctly. This operation should only be done while all etcd applications are stopped.
|
||||
|
||||
@ -15,7 +15,7 @@ etcdctl --endpoint new_cluster.example.com import --snap backup.snap
|
||||
```
|
||||
|
||||
If you have a large amount of data, you can specify more concurrent works to copy data in parallel by using `-c` flag.
|
||||
If you have hidden keys to copy, you can use `--hidden` flag to specify. For example fleet uses `/_coreos.com/fleet` so to import those keys use `--hidden /_coreos.com`.
|
||||
If you have hidden keys to copy, you can use `--hidden` flag to specify.
|
||||
|
||||
And the data will quickly copy into the new cluster:
|
||||
|
||||
|
@ -1,8 +1,8 @@
|
||||
# Administration
|
||||
## Administration
|
||||
|
||||
## Data Directory
|
||||
### Data Directory
|
||||
|
||||
### Lifecycle
|
||||
#### Lifecycle
|
||||
|
||||
When first started, etcd stores its configuration into a data directory specified by the data-dir configuration parameter.
|
||||
Configuration is stored in the write ahead log and includes: the local member ID, cluster ID, and initial cluster configuration.
|
||||
@ -18,7 +18,9 @@ Using an out-of-date data directory can lead to inconsistency as the member had
|
||||
For maximum safety, if an etcd member suffers any sort of data corruption or loss, it must be removed from the cluster.
|
||||
Once removed the member can be re-added with an empty data directory.
|
||||
|
||||
### Contents
|
||||
[remove-a-member]: runtime-configuration.md#remove-a-member
|
||||
|
||||
#### Contents
|
||||
|
||||
The data directory has two sub-directories in it:
|
||||
|
||||
@ -27,18 +29,21 @@ The data directory has two sub-directories in it:
|
||||
|
||||
If `--wal-dir` flag is set, etcd will write the write ahead log files to the specified directory instead of data directory.
|
||||
|
||||
## Cluster Management
|
||||
[wal-pkg]: http://godoc.org/github.com/coreos/etcd/wal
|
||||
[snap-pkg]: http://godoc.org/github.com/coreos/etcd/snap
|
||||
|
||||
### Lifecycle
|
||||
### Cluster Management
|
||||
|
||||
#### Lifecycle
|
||||
|
||||
If you are spinning up multiple clusters for testing it is recommended that you specify a unique initial-cluster-token for the different clusters.
|
||||
This can protect you from cluster corruption in case of mis-configuration because two members started with different cluster tokens will refuse members from each other.
|
||||
|
||||
### Monitoring
|
||||
#### Monitoring
|
||||
|
||||
It is important to monitor your production etcd cluster for healthy information and runtime metrics.
|
||||
|
||||
#### Health Monitoring
|
||||
##### Health Monitoring
|
||||
|
||||
At lowest level, etcd exposes health information via HTTP at `/health` in JSON format. If it returns `{"health": "true"}`, then the cluster is healthy. Please note the `/health` endpoint is still an experimental one as in etcd 2.2.
|
||||
|
||||
@ -58,16 +63,16 @@ member fd422379fda50e48 is healthy: got healthy result from http://127.0.0.1:323
|
||||
cluster is healthy
|
||||
```
|
||||
|
||||
#### Runtime Metrics
|
||||
##### Runtime Metrics
|
||||
|
||||
etcd uses [Prometheus][prometheus] for metrics reporting in the server. You can read more through the runtime metrics [doc][metrics].
|
||||
etcd uses [Prometheus](http://prometheus.io/) for metrics reporting in the server. You can read more through the runtime metrics [doc](metrics.md).
|
||||
|
||||
### Debugging
|
||||
#### Debugging
|
||||
|
||||
Debugging a distributed system can be difficult. etcd provides several ways to make debug
|
||||
easier.
|
||||
|
||||
#### Enabling Debug Logging
|
||||
##### Enabling Debug Logging
|
||||
|
||||
When you want to debug etcd without stopping it, you can enable debug logging at runtime.
|
||||
etcd exposes logging configuration at `/config/local/log`.
|
||||
@ -80,7 +85,7 @@ $ curl http://127.0.0.1:2379/config/local/log -XPUT -d '{"Level":"INFO"}'
|
||||
$ # debug logging disabled
|
||||
```
|
||||
|
||||
#### Debugging Variables
|
||||
##### Debugging Variables
|
||||
|
||||
Debug variables are exposed for real-time debugging purposes. Developers who are familiar with etcd can utilize these variables to debug unexpected behavior. etcd exposes debug variables via HTTP at `/debug/vars` in JSON format. The debug variables contains
|
||||
`cmdline`, `file_descriptor_limit`, `memstats` and `raft.status`.
|
||||
@ -89,7 +94,7 @@ Debug variables are exposed for real-time debugging purposes. Developers who are
|
||||
|
||||
`file_descriptor_limit` is the max number of file descriptors etcd can utilize.
|
||||
|
||||
`memstats` is explained in detail in the [Go runtime documentation][golang-memstats].
|
||||
`memstats` is well explained [here](http://golang.org/pkg/runtime/#MemStats).
|
||||
|
||||
`raft.status` is useful when you want to debug low level raft issues if you are familiar with raft internals. In most cases, you do not need to check `raft.status`.
|
||||
|
||||
@ -102,7 +107,7 @@ Debug variables are exposed for real-time debugging purposes. Developers who are
|
||||
}
|
||||
```
|
||||
|
||||
### Optimal Cluster Size
|
||||
#### Optimal Cluster Size
|
||||
|
||||
The recommended etcd cluster size is 3, 5 or 7, which is decided by the fault tolerance requirement. A 7-member cluster can provide enough fault tolerance in most cases. While larger cluster provides better fault tolerance the write performance reduces since data needs to be replicated to more machines.
|
||||
|
||||
@ -125,7 +130,7 @@ As you can see, adding another member to bring the size of cluster up to an odd
|
||||
|
||||
#### Changing Cluster Size
|
||||
|
||||
After your cluster is up and running, adding or removing members is done via [runtime reconfiguration][runtime-reconfig], which allows the cluster to be modified without downtime. The `etcdctl` tool has `member list`, `member add` and `member remove` commands to complete this process.
|
||||
After your cluster is up and running, adding or removing members is done via [runtime reconfiguration](runtime-configuration.md#cluster-reconfiguration-operations), which allows the cluster to be modified without downtime. The `etcdctl` tool has a `member list`, `member add` and `member remove` commands to complete this process.
|
||||
|
||||
### Member Migration
|
||||
|
||||
@ -133,10 +138,10 @@ When there is a scheduled machine maintenance or retirement, you might want to m
|
||||
|
||||
The data directory contains all the data to recover a member to its point-in-time state. To migrate a member:
|
||||
|
||||
* Stop the member process.
|
||||
* Copy the data directory of the now-idle member to the new machine.
|
||||
* Update the peer URLs for the replaced member to reflect the new machine according to the [runtime reconfiguration instructions][update-member].
|
||||
* Start etcd on the new machine, using the same configuration and the copy of the data directory.
|
||||
* Stop the member process
|
||||
* Copy the data directory of the now-idle member to the new machine
|
||||
* Update the peer URLs for that member to reflect the new machine according to the [runtime configuration] [change peer url]
|
||||
* Start etcd on the new machine, using the same configuration and the copy of the data directory
|
||||
|
||||
This example will walk you through the process of migrating the infra1 member to a new machine:
|
||||
|
||||
@ -147,7 +152,7 @@ This example will walk you through the process of migrating the infra1 member to
|
||||
|infra2|10.0.1.12:2380|
|
||||
|
||||
```sh
|
||||
$ export ETCDCTL_ENDPOINT=http://10.0.1.10:2379,http://10.0.1.11:2379,http://10.0.1.12:2379
|
||||
$ export ETCDCTL_PEERS=http://10.0.1.10:2379,http://10.0.1.11:2379,http://10.0.1.12:2379
|
||||
```
|
||||
|
||||
```sh
|
||||
@ -207,6 +212,8 @@ etcd -name infra1 \
|
||||
-advertise-client-urls http://10.0.1.13:2379,http://127.0.0.1:2379
|
||||
```
|
||||
|
||||
[change peer url]: runtime-configuration.md#update-a-member
|
||||
|
||||
### Disaster Recovery
|
||||
|
||||
etcd is designed to be resilient to machine failures. An etcd cluster can automatically recover from any number of temporary failures (for example, machine reboots), and a cluster of N members can tolerate up to _(N-1)/2_ permanent failures (where a member can no longer access the cluster, due to hardware failure or disk corruption). However, in extreme circumstances, a cluster might permanently lose enough members such that quorum is irrevocably lost. For example, if a three-node cluster suffered two simultaneous and unrecoverable machine failures, it would be normally impossible for the cluster to restore quorum and continue functioning.
|
||||
@ -253,9 +260,9 @@ Once you have verified that etcd has started successfully, shut it down and move
|
||||
|
||||
#### Restoring the cluster
|
||||
|
||||
Now that the node is running successfully, [change its advertised peer URLs][update-member], as the `--force-new-cluster` option has set the peer URL to the default listening on localhost.
|
||||
Now that if the node is running successfully, you should [change its advertised peer URLs](runtime-configuration.md#update-a-member), as the `--force-new-cluster` has set the peer URL to the default (listening on localhost).
|
||||
|
||||
You can then add more nodes to the cluster and restore resiliency. See the [add a new member][add-a-member] guide for more details. **NB:** If you are trying to restore your cluster using old failed etcd nodes, please make sure you have stopped old etcd instances and removed their old data directories specified by the data-dir configuration parameter.
|
||||
You can then add more nodes to the cluster and restore resiliency. See the [add a new member](runtime-configuration.md#add-a-new-member) guide for more details. **NB:** If you are trying to restore your cluster using old failed etcd nodes, please make sure you have stopped old etcd instances and removed their old data directories specified by the data-dir configuration parameter.
|
||||
|
||||
### Client Request Timeout
|
||||
|
||||
@ -286,18 +293,6 @@ If timeout happens several times continuously, administrators should check statu
|
||||
|
||||
#### Maximum OS threads
|
||||
|
||||
By default, etcd uses the default configuration of the Go 1.4 runtime, which means that at most one operating system thread will be used to execute code simultaneously. (Note that this default behavior [has changed in Go 1.5][golang1.5-runtime]).
|
||||
By default, etcd uses the default configuration of the Go 1.4 runtime, which means that at most one operating system thread will be used to execute code simultaneously. (Note that this default behavior [may change in Go 1.5](https://docs.google.com/document/d/1At2Ls5_fhJQ59kDK2DFVhFu3g5mATSXqqV5QrxinasI/edit)).
|
||||
|
||||
When using etcd in heavy-load scenarios on machines with multiple cores it will usually be desirable to increase the number of threads that etcd can utilize. To do this, simply set the environment variable GOMAXPROCS to the desired number when starting etcd. For more information on this variable, see the [Go runtime documentation][golang-runtime].
|
||||
|
||||
[add-a-member]: runtime-configuration.md#add-a-new-member
|
||||
[golang1.5-runtime]: https://golang.org/doc/go1.5#runtime
|
||||
[golang-memstats]: https://golang.org/pkg/runtime/#MemStats
|
||||
[golang-runtime]: https://golang.org/pkg/runtime
|
||||
[metrics]: metrics.md
|
||||
[prometheus]: http://prometheus.io/
|
||||
[remove-a-member]: runtime-configuration.md#remove-a-member
|
||||
[runtime-reconfig]: runtime-configuration.md#cluster-reconfiguration-operations
|
||||
[snap-pkg]: http://godoc.org/github.com/coreos/etcd/snap
|
||||
[update-a-member]: runtime-configuration.md#update-a-member
|
||||
[wal-pkg]: http://godoc.org/github.com/coreos/etcd/wal
|
||||
When using etcd in heavy-load scenarios on machines with multiple cores it will usually be desirable to increase the number of threads that etcd can utilize. To do this, simply set the environment variable `GOMAXPROCS` to the desired number when starting etcd. For more information on this variable, see the Go [runtime](https://golang.org/pkg/runtime) documentation.
|
||||
|
@ -78,9 +78,12 @@ X-Raft-Index: 5398
|
||||
X-Raft-Term: 1
|
||||
```
|
||||
|
||||
* `X-Etcd-Index` is the current etcd index as explained above. When request is a watch on key space, `X-Etcd-Index` is the current etcd index when the watch starts, which means that the watched event may happen after `X-Etcd-Index`.
|
||||
* `X-Raft-Index` is similar to the etcd index but is for the underlying raft protocol.
|
||||
* `X-Raft-Term` is an integer that will increase whenever an etcd master election happens in the cluster. If this number is increasing rapidly, you may need to tune the election timeout. See the [tuning][tuning] section for details.
|
||||
- `X-Etcd-Index` is the current etcd index as explained above. When request is a watch on key space, `X-Etcd-Index` is the current etcd index when the watch starts, which means that the watched event may happen after `X-Etcd-Index`.
|
||||
- `X-Raft-Index` is similar to the etcd index but is for the underlying raft protocol
|
||||
- `X-Raft-Term` is an integer that will increase whenever an etcd master election happens in the cluster. If this number is increasing rapidly, you may need to tune the election timeout. See the [tuning][tuning] section for details.
|
||||
|
||||
[tuning]: tuning.md
|
||||
|
||||
|
||||
### Get the value of a key
|
||||
|
||||
@ -231,50 +234,6 @@ curl http://127.0.0.1:2379/v2/keys/foo -XPUT -d value=bar -d ttl= -d prevExist=t
|
||||
}
|
||||
```
|
||||
|
||||
### Refreshing key TTL
|
||||
|
||||
Keys in etcd can be refreshed without notifying watchers
|
||||
this can be achieved by setting the refresh to true when updating a TTL
|
||||
|
||||
You cannot update the value of a key when refreshing it
|
||||
|
||||
```sh
|
||||
curl http://127.0.0.1:2379/v2/keys/foo -XPUT -d value=bar -d ttl=5
|
||||
curl http://127.0.0.1:2379/v2/keys/foo -XPUT -d ttl=5 -d refresh=true -d prevExist=true
|
||||
```
|
||||
|
||||
```json
|
||||
{
|
||||
"action": "set",
|
||||
"node": {
|
||||
"createdIndex": 5,
|
||||
"expiration": "2013-12-04T12:01:21.874888581-08:00",
|
||||
"key": "/foo",
|
||||
"modifiedIndex": 5,
|
||||
"ttl": 5,
|
||||
"value": "bar"
|
||||
}
|
||||
}
|
||||
{
|
||||
"action":"update",
|
||||
"node":{
|
||||
"key":"/foo",
|
||||
"value":"bar",
|
||||
"expiration": "2013-12-04T12:01:26.874888581-08:00",
|
||||
"ttl":5,
|
||||
"modifiedIndex":6,
|
||||
"createdIndex":5
|
||||
},
|
||||
"prevNode":{
|
||||
"key":"/foo",
|
||||
"value":"bar",
|
||||
"expiration":"2013-12-04T12:01:21.874888581-08:00",
|
||||
"ttl":3,
|
||||
"modifiedIndex":5,
|
||||
"createdIndex":5
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Waiting for a change
|
||||
|
||||
@ -400,7 +359,7 @@ curl 'http://127.0.0.1:2379/v2/keys/foo?wait=true&waitIndex=2008'
|
||||
#### Connection being closed prematurely
|
||||
|
||||
The server may close a long polling connection before emitting any events.
|
||||
This can happen due to a timeout or the server being shutdown.
|
||||
This can happend due to a timeout or the server being shutdown.
|
||||
Since the HTTP header is sent immediately upon accepting the connection, the response will be seen as empty: `200 OK` and empty body.
|
||||
The clients should be prepared to deal with this scenario and retry the watch.
|
||||
|
||||
@ -541,15 +500,13 @@ etcd can be used as a centralized coordination service in a cluster, and `Compar
|
||||
|
||||
This command will set the value of a key only if the client-provided conditions are equal to the current conditions.
|
||||
|
||||
*Note that `CompareAndSwap` does not work with [directories][directories]. If an attempt is made to `CompareAndSwap` a directory, a 102 "Not a file" error will be returned.*
|
||||
|
||||
The current comparable conditions are:
|
||||
|
||||
1. `prevValue` - checks the previous value of the key.
|
||||
|
||||
2. `prevIndex` - checks the previous modifiedIndex of the key.
|
||||
|
||||
3. `prevExist` - checks existence of the key: if `prevExist` is true, it is an `update` request; if `prevExist` is `false`, it is a `create` request.
|
||||
3. `prevExist` - checks existence of the key: if `prevExist` is true, it is an `update` request; if prevExist is `false`, it is a `create` request.
|
||||
|
||||
Here is a simple example.
|
||||
Let's create a key-value pair first: `foo=one`.
|
||||
@ -628,8 +585,6 @@ We successfully changed the value from "one" to "two" since we gave the correct
|
||||
|
||||
This command will delete a key only if the client-provided conditions are equal to the current conditions.
|
||||
|
||||
*Note that `CompareAndDelete` does not work with [directories]. If an attempt is made to `CompareAndDelete` a directory, a 102 "Not a file" error will be returned.*
|
||||
|
||||
The current comparable conditions are:
|
||||
|
||||
1. `prevValue` - checks the previous value of the key.
|
||||
@ -1093,7 +1048,6 @@ curl http://127.0.0.1:2379/v2/stats/self
|
||||
### Store Statistics
|
||||
|
||||
The store statistics include information about the operations that this node has handled.
|
||||
Note that v2 `store Statistics` is stored in-memory. When a member stops, store statistics will reset on restart.
|
||||
|
||||
Operations that modify the store's state like create, delete, set and update are seen by the entire cluster and the number will increase on all nodes.
|
||||
Operations like get and watch are node local and will only be seen on this node.
|
||||
@ -1123,8 +1077,6 @@ curl http://127.0.0.1:2379/v2/stats/store
|
||||
|
||||
## Cluster Config
|
||||
|
||||
See the [members API][members-api] for details on the cluster management.
|
||||
See the [other etcd APIs][other-apis] for details on the cluster management.
|
||||
|
||||
[directories]: #listing-a-directory
|
||||
[members-api]: members_api.md
|
||||
[tuning]: tuning.md
|
||||
[other-apis]: other_apis.md
|
||||
|
@ -1,92 +0,0 @@
|
||||
# etcd3 API
|
||||
|
||||
TODO: API doc
|
||||
|
||||
## Data Model
|
||||
|
||||
etcd is designed to reliably store infrequently updated data and provide reliable watch queries. etcd exposes previous versions of key-value pairs to support inexpensive snapshots and watch history events (“time travel queries”). A persistent, multi-version, concurrency-control data model is a good fit for these use cases.
|
||||
|
||||
etcd stores data in a multiversion [persistent][persistent-ds] key-value store. The persistent key-value store preserves the previous version of a key-value pair when its value is superseded with new data. The key-value store is effectively immutable; its operations do not update the structure in-place, but instead always generates a new updated structure. All past versions of keys are still accessible and watchable after modification. To prevent the data store from growing indefinitely over time from maintaining old versions, the store may be compacted to shed the oldest versions of superseded data.
|
||||
|
||||
### Logical View
|
||||
|
||||
The store’s logical view is a flat binary key space. The key space has a lexically sorted index on byte string keys so range queries are inexpensive.
|
||||
|
||||
The key space maintains multiple revisions. Each atomic mutative operation (e.g., a transaction operation may contain multiple operations) creates a new revision on the key space. All data held by previous revisions remains unchanged. Old versions of key can still be accessed through previous revisions. Likewise, revisions are indexed as well; ranging over revisions with watchers is efficient. If the store is compacted to recover space, revisions before the compact revision will be removed.
|
||||
|
||||
A key’s lifetime spans a generation. Each key may have one or multiple generations. Creating a key increments the generation of that key, starting at 1 if the key never existed. Deleting a key generates a key tombstone, concluding the key’s current generation. Each modification of a key creates a new version of the key. Once a compaction happens, any generation ended before the given revision will be removed and values set before the compaction revision except the latest one will be removed.
|
||||
|
||||
### Physical View
|
||||
|
||||
etcd stores the physical data as key-value pairs in a persistent [b+tree][b+tree]. Each revision of the store’s state only contains the delta from its previous revision to be efficient. A single revision may correspond to multiple keys in the tree.
|
||||
|
||||
The key of key-value pair is a 3-tuple (major, sub, type). Major is the store revision holding the key. Sub differentiates among keys within the same revision. Type is an optional suffix for special value (e.g., `t` if the value contains a tombstone). The value of the key-value pair contains the modification from previous revision, thus one delta from previous revision. The b+tree is ordered by key in lexical byte-order. Ranged lookups over revision deltas are fast; this enables quickly finding modifications from one specific revision to another. Compaction removes out-of-date keys-value pairs.
|
||||
|
||||
etcd also keeps a secondary in-memory [btree][btree] index to speed up range queries over keys. The keys in the btree index are the keys of the store exposed to user. The value is a pointer to the modification of the persistent b+tree. Compaction removes dead pointers.
|
||||
|
||||
## KV API Guarantees
|
||||
|
||||
etcd is a consistent and durable key value store with mini-transaction(TODO: link to txn doc when we have it) support. The key value store is exposed through the KV APIs. etcd tries to ensure the strongest consistency and durability guarantees for a distributed system. This specification enumerates the KV API guarantees made by etcd.
|
||||
|
||||
### APIs to consider
|
||||
|
||||
* Read APIs
|
||||
* range
|
||||
* watch
|
||||
* Write APIs
|
||||
* put
|
||||
* delete
|
||||
* Combination (read-modify-write) APIs
|
||||
* txn
|
||||
|
||||
### etcd Specific Definitions
|
||||
|
||||
#### operation completed
|
||||
|
||||
An etcd operation is considered complete when it is committed through consensus, and therefore “executed” -- permanently stored -- by the etcd storage engine. The client knows an operation is completed when it receives a response from the etcd server. Note that the client may be uncertain about the status of an operation if it times out, or there is a network disruption between the client and the etcd member. etcd may also abort operations when there is a leader election. etcd does not send `abort` responses to clients’ outstanding requests in this event.
|
||||
|
||||
#### revision
|
||||
|
||||
An etcd operation that modifies the key value store is assigned with a single increasing revision. A transaction operation might modifies the key value store multiple times, but only one revision is assigned. The revision attribute of a key value pair that modified by the operation has the same value as the revision of the operation. The revision can be used as a logical clock for key value store. A key value pair that has a larger revision is modified after a key value pair with a smaller revision. Two key value pairs that have the same revision are modified by an operation "concurrently".
|
||||
|
||||
### Guarantees Provided
|
||||
|
||||
#### Atomicity
|
||||
|
||||
All API requests are atomic; an operation either completes entirely or not at all. For watch requests, all events generated by one operation will be in one watch response. Watch never observes partial events for a single operation.
|
||||
|
||||
#### Consistency
|
||||
|
||||
All API calls ensure [sequential consistency][seq_consistency], the strongest consistency guarantee available from distributed systems. No matter which etcd member server a client makes requests to, a client reads the same events in the same order. If two members complete the same number of operations, the state of the two members is consistent.
|
||||
|
||||
For watch operations, etcd guarantees to return the same value for the same key across all members for the same revision. For range operations, etcd has a similar guarantee for [linearized][Linearizability] access; serialized access may be behind the quorum state, so that the later revision is not yet available.
|
||||
|
||||
As with all distributed systems, it is impossible for etcd to ensure [strict consistency][strict_consistency]. etcd does not guarantee that it will return to a read the “most recent” value (as measured by a wall clock when a request is completed) available on any cluster member.
|
||||
|
||||
#### Isolation
|
||||
|
||||
etcd ensures [serializable isolation][serializable_isolation], which is the highest isolation level available in distributed systems. Read operations will never observe any intermediate data.
|
||||
|
||||
#### Durability
|
||||
|
||||
Any completed operations are durable. All accessible data is also durable data. A read will never return data that has not been made durable.
|
||||
|
||||
#### Linearizability
|
||||
|
||||
Linearizability (also known as Atomic Consistency or External Consistency) is a consistency level between strict consistency and sequential consistency.
|
||||
|
||||
For linearizability, suppose each operation receives a timestamp from a loosely synchronized global clock. Operations are linearized if and only if they always complete as though they were executed in a sequential order and each operation appears to complete in the order specified by the program. Likewise, if an operation’s timestamp precedes another, that operation must also precede the other operation in the sequence.
|
||||
|
||||
For example, consider a client completing a write at time point 1 (*t1*). A client issuing a read at *t2* (for *t2* > *t1*) should receive a value at least as recent as the previous write, completed at *t1*. However, the read might actually complete only by *t3*, and the returned value, current at *t2* when the read began, might be "stale" by *t3*.
|
||||
|
||||
etcd does not ensure linearizability for watch operations. Users are expected to verify the revision of watch responses to ensure correct ordering.
|
||||
|
||||
etcd ensures linearizability for all other operations by default. Linearizability comes with a cost, however, because linearized requests must go through the Raft consensus process. To obtain lower latencies and higher throughput for read requests, clients can configure a request’s consistency mode to `serializable`, which may access stale data with respect to quorum, but removes the performance penalty of linearized accesses' reliance on live consensus.
|
||||
|
||||
[persistent-ds]: [https://en.wikipedia.org/wiki/Persistent_data_structure]
|
||||
[btree]: [https://en.wikipedia.org/wiki/B-tree]
|
||||
[b+tree]: [https://en.wikipedia.org/wiki/B%2B_tree]
|
||||
[seq_consistency]: [https://en.wikipedia.org/wiki/Consistency_model#Sequential_consistency]
|
||||
[strict_consistency]: [https://en.wikipedia.org/wiki/Consistency_model#Strict_consistency]
|
||||
[serializable_isolation]: [https://en.wikipedia.org/wiki/Isolation_(database_systems)#Serializable]
|
||||
[Linearizability]: [#Linearizability]
|
@ -19,7 +19,7 @@ Each role has exact one associated Permission List. An permission list exists fo
|
||||
|
||||
The special static ROOT (named `root`) role has a full permissions on all key-value resources, the permission to manage user resources and settings resources. Only the ROOT role has the permission to manage user resources and modify settings resources. The ROOT role is built-in and does not need to be created.
|
||||
|
||||
There is also a special GUEST role, named 'guest'. These are the permissions given to unauthenticated requests to etcd. This role will be created automatically, and by default allows access to the full keyspace due to backward compatibility. (etcd did not previously authenticate any actions.). This role can be modified by a ROOT role holder at any time, to reduce the capabilities of unauthenticated users.
|
||||
There is also a special GUEST role, named 'guest'. These are the permissions given to unauthenticated requests to etcd. This role will be created automatically, and by default allows access to the full keyspace due to backward compatability. (etcd did not previously authenticate any actions.). This role can be modified by a ROOT role holder at any time, to reduce the capabilities of unauthenticated users.
|
||||
|
||||
#### Permissions
|
||||
|
||||
@ -40,7 +40,7 @@ Specific settings for the cluster as a whole. This can include adding and removi
|
||||
## v2 Auth
|
||||
|
||||
### Basic Auth
|
||||
We only support [Basic Auth][basic-auth] for the first version. Client needs to attach the basic auth to the HTTP Authorization Header.
|
||||
We only support [Basic Auth](http://en.wikipedia.org/wiki/Basic_access_authentication) for the first version. Client needs to attach the basic auth to the HTTP Authorization Header.
|
||||
|
||||
### Authorization field for operations
|
||||
Added to requests to /v2/keys, /v2/auth
|
||||
@ -124,7 +124,7 @@ The User JSON object is formed as follows:
|
||||
|
||||
Password is only passed when necessary.
|
||||
|
||||
**Get a List of Users**
|
||||
**Get a list of users**
|
||||
|
||||
GET/HEAD /v2/auth/users
|
||||
|
||||
@ -137,36 +137,7 @@ GET/HEAD /v2/auth/users
|
||||
Content-type: application/json
|
||||
200 Body:
|
||||
{
|
||||
"users": [
|
||||
{
|
||||
"user": "alice",
|
||||
"roles": [
|
||||
{
|
||||
"role": "root",
|
||||
"permissions": {
|
||||
"kv": {
|
||||
"read": ["*"],
|
||||
"write": ["*"]
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"user": "bob",
|
||||
"roles": [
|
||||
{
|
||||
"role": "guest",
|
||||
"permissions": {
|
||||
"kv": {
|
||||
"read": ["*"],
|
||||
"write": ["*"]
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
]
|
||||
"users": ["alice", "bob", "eve"]
|
||||
}
|
||||
|
||||
**Get User Details**
|
||||
@ -184,26 +155,7 @@ GET/HEAD /v2/auth/users/alice
|
||||
200 Body:
|
||||
{
|
||||
"user" : "alice",
|
||||
"roles" : [
|
||||
{
|
||||
"role": "fleet",
|
||||
"permissions" : {
|
||||
"kv" : {
|
||||
"read": [ "/fleet/" ],
|
||||
"write": [ "/fleet/" ]
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"role": "etcd",
|
||||
"permissions" : {
|
||||
"kv" : {
|
||||
"read": [ "*" ],
|
||||
"write": [ "*" ]
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
"roles" : ["fleet", "etcd"]
|
||||
}
|
||||
|
||||
**Create Or Update A User**
|
||||
@ -261,6 +213,22 @@ A full role structure may look like this. A Permission List structure is used fo
|
||||
}
|
||||
```
|
||||
|
||||
**Get a list of Roles**
|
||||
|
||||
GET/HEAD /v2/auth/roles
|
||||
|
||||
Sent Headers:
|
||||
Authorization: Basic <BasicAuthString>
|
||||
Possible Status Codes:
|
||||
200 OK
|
||||
401 Unauthorized
|
||||
200 Headers:
|
||||
Content-type: application/json
|
||||
200 Body:
|
||||
{
|
||||
"roles": ["fleet", "etcd", "quay"]
|
||||
}
|
||||
|
||||
**Get Role Details**
|
||||
|
||||
GET/HEAD /v2/auth/roles/fleet
|
||||
@ -284,50 +252,6 @@ GET/HEAD /v2/auth/roles/fleet
|
||||
}
|
||||
}
|
||||
|
||||
**Get a list of Roles**
|
||||
|
||||
GET/HEAD /v2/auth/roles
|
||||
|
||||
Sent Headers:
|
||||
Authorization: Basic <BasicAuthString>
|
||||
Possible Status Codes:
|
||||
200 OK
|
||||
401 Unauthorized
|
||||
200 Headers:
|
||||
Content-type: application/json
|
||||
200 Body:
|
||||
{
|
||||
"roles": [
|
||||
{
|
||||
"role": "fleet",
|
||||
"permissions": {
|
||||
"kv": {
|
||||
"read": ["/fleet/"],
|
||||
"write": ["/fleet/"]
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"role": "etcd",
|
||||
"permissions": {
|
||||
"kv": {
|
||||
"read": ["*"],
|
||||
"write": ["*"]
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"role": "quay",
|
||||
"permissions": {
|
||||
"kv": {
|
||||
"read": ["*"],
|
||||
"write": ["*"]
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
|
||||
**Create Or Update A Role**
|
||||
|
||||
PUT /v2/auth/roles/rkt
|
||||
@ -508,4 +432,3 @@ PUT /v2/keys/rkt/RktData
|
||||
|
||||
Reads and writes outside the prefixes granted will fail with a 401 Unauthorized.
|
||||
|
||||
[basic-auth]: https://en.wikipedia.org/wiki/Basic_access_authentication
|
||||
|
@ -1,12 +1,14 @@
|
||||
# Authentication Guide
|
||||
|
||||
**NOTE: The authentication feature is considered experimental. We may change workflow without warning in future releases.**
|
||||
|
||||
## Overview
|
||||
|
||||
Authentication -- having users and roles in etcd -- was added in etcd 2.1. This guide will help you set up basic authentication in etcd.
|
||||
|
||||
etcd before 2.1 was a completely open system; anyone with access to the API could change keys. In order to preserve backward compatibility and upgradability, this feature is off by default.
|
||||
|
||||
For a full discussion of the RESTful API, see [the authentication API documentation][auth-api]
|
||||
For a full discussion of the RESTful API, see [the authentication API documentation](auth_api.md)
|
||||
|
||||
## Special Users and Roles
|
||||
|
||||
@ -92,7 +94,6 @@ Roles are granted access to various parts of the keyspace, a single path at a ti
|
||||
Reading a path is simple; if the path ends in `*`, that key **and all keys prefixed with it**, are granted to holders of this role. If it does not end in `*`, only that key and that key alone is granted.
|
||||
|
||||
Access can be granted as either read, write, or both, as in the following examples:
|
||||
|
||||
```
|
||||
# Give read access to keys under the /foo directory
|
||||
$ etcdctl role grant myrolename -path '/foo/*' -read
|
||||
@ -133,7 +134,7 @@ $ etcdctl role remove myrolename
|
||||
|
||||
## Enabling authentication
|
||||
|
||||
The minimal steps to enabling auth are as follows. The administrator can set up users and roles before or after enabling authentication, as a matter of preference.
|
||||
The minimal steps to enabling auth follow. The administrator can set up users and roles before or after enabling authentication, as a matter of preference.
|
||||
|
||||
Make sure the root user is created:
|
||||
|
||||
@ -176,5 +177,3 @@ $ etcdctl -u user get foo
|
||||
```
|
||||
|
||||
Otherwise, all `etcdctl` commands remain the same. Users and roles can still be created and modified, but require authentication by a user with the root role.
|
||||
|
||||
[auth-api]: auth_api.md
|
||||
|
@ -32,7 +32,7 @@ The consistent flag for read operations is removed in etcd 2.0.0. The normal rea
|
||||
|
||||
The read consistency guarantees are:
|
||||
|
||||
The consistent read guarantees the sequential consistency within one client that talks to one etcd server. Read/Write from one client to one etcd member should be observed in order. If one client write a value to an etcd server successfully, it should be able to get the value out of the server immediately.
|
||||
The consistent read guarantees the sequential consistency within one client that talks to one etcd server. Read/Write from one client to one etcd member should be observed in order. If one client write a value to a etcd server successfully, it should be able to get the value out of the server immediately.
|
||||
|
||||
Each etcd member will proxy the request to leader and only return the result to user after the result is applied on the local member. Thus after the write succeed, the user is guaranteed to see the value on the member it sent the request to.
|
||||
|
||||
@ -60,9 +60,9 @@ A size key needs to be provided inside a [discovery token][discoverytoken].
|
||||
|
||||
## HTTP Admin API
|
||||
|
||||
`v2/admin` on peer url and `v2/keys/_etcd` are unified under the new [v2/members API][members-api] to better explain which machines are part of an etcd cluster, and to simplify the keyspace for all your use cases.
|
||||
`v2/admin` on peer url and `v2/keys/_etcd` are unified under the new [v2/member API][memberapi] to better explain which machines are part of an etcd cluster, and to simplify the keyspace for all your use cases.
|
||||
|
||||
[members-api]: members_api.md
|
||||
[memberapi]: other_apis.md
|
||||
|
||||
## HTTP Key Value API
|
||||
- The follower can now transparently proxy write requests to the leader. Clients will no longer see 307 redirections to the leader from etcd.
|
||||
|
@ -2,17 +2,12 @@
|
||||
|
||||
etcd benchmarks will be published regularly and tracked for each release below:
|
||||
|
||||
- [etcd v2.1.0-alpha][2.1]
|
||||
- [etcd v2.2.0-rc][2.2]
|
||||
- [etcd v3 demo][3.0]
|
||||
- [etcd v2.1.0-alpha](./etcd-2-1-0-alpha-benchmarks.md)
|
||||
- [etcd v2.2.0-rc](./etcd-2-2-0-rc-benchmarks.md)
|
||||
- [etcd v3 demo](./etcd-3-demo-benchmarks.md)
|
||||
|
||||
# Memory Usage Benchmarks
|
||||
|
||||
It records expected memory usage in different scenarios.
|
||||
|
||||
- [etcd v2.2.0-rc][2.2-mem]
|
||||
|
||||
[2.1]: etcd-2-1-0-alpha-benchmarks.md
|
||||
[2.2]: etcd-2-2-0-rc-benchmarks.md
|
||||
[2.2-mem]: etcd-2-2-0-rc-memory-benchmarks.md
|
||||
[3.0]: etcd-3-demo-benchmarks.md
|
||||
- [etcd v2.2.0-rc](./etcd-2-2-0-rc-memory-benchmarks.md)
|
||||
|
@ -14,7 +14,7 @@ GCE n1-highcpu-2 machine type
|
||||
|
||||
## Testing
|
||||
|
||||
Bootstrap another machine and use the [boom HTTP benchmark tool][boom] to send requests to each etcd member. Check the [benchmark hacking guide][hack-benchmark] for detailed instructions.
|
||||
Bootstrap another machine and use benchmark tool [boom](https://github.com/rakyll/boom) to send requests to each etcd member.
|
||||
|
||||
## Performance
|
||||
|
||||
@ -47,6 +47,3 @@ Bootstrap another machine and use the [boom HTTP benchmark tool][boom] to send r
|
||||
| 64 | 256 | all servers | 3260 | 123.8 |
|
||||
| 256 | 64 | all servers | 1033 | 121.5 |
|
||||
| 256 | 256 | all servers | 3061 | 119.3 |
|
||||
|
||||
[boom]: https://github.com/rakyll/boom
|
||||
[hack-benchmark]: /hack/benchmark/
|
||||
|
@ -1,69 +0,0 @@
|
||||
# Benchmarking etcd v2.2.0
|
||||
|
||||
## Physical Machines
|
||||
|
||||
GCE n1-highcpu-2 machine type
|
||||
|
||||
- 1x dedicated local SSD mounted as etcd data directory
|
||||
- 1x dedicated slow disk for the OS
|
||||
- 1.8 GB memory
|
||||
- 2x CPUs
|
||||
|
||||
## etcd Cluster
|
||||
|
||||
3 etcd 2.2.0 members, each runs on a single machine.
|
||||
|
||||
Detailed versions:
|
||||
|
||||
```
|
||||
etcd Version: 2.2.0
|
||||
Git SHA: e4561dd
|
||||
Go Version: go1.5
|
||||
Go OS/Arch: linux/amd64
|
||||
```
|
||||
|
||||
## Testing
|
||||
|
||||
Bootstrap another machine, outside of the etcd cluster, and run the [`boom` HTTP benchmark tool](https://github.com/rakyll/boom) with a connection reuse patch to send requests to each etcd cluster member. See the [benchmark instructions](../../hack/benchmark/) for the patch and the steps to reproduce our procedures.
|
||||
|
||||
The performance is calulated through results of 100 benchmark rounds.
|
||||
|
||||
## Performance
|
||||
|
||||
### Single Key Read Performance
|
||||
|
||||
| key size in bytes | number of clients | target etcd server | average read QPS | read QPS stddev | average 90th Percentile Latency (ms) | latency stddev |
|
||||
|-------------------|-------------------|--------------------|------------------|-----------------|--------------------------------------|----------------|
|
||||
| 64 | 1 | leader only | 2303 | 200 | 0.49 | 0.06 |
|
||||
| 64 | 64 | leader only | 15048 | 685 | 7.60 | 0.46 |
|
||||
| 64 | 256 | leader only | 14508 | 434 | 29.76 | 1.05 |
|
||||
| 256 | 1 | leader only | 2162 | 214 | 0.52 | 0.06 |
|
||||
| 256 | 64 | leader only | 14789 | 792 | 7.69| 0.48 |
|
||||
| 256 | 256 | leader only | 14424 | 512 | 29.92 | 1.42 |
|
||||
| 64 | 64 | all servers | 45752 | 2048 | 2.47 | 0.14 |
|
||||
| 64 | 256 | all servers | 46592 | 1273 | 10.14 | 0.59 |
|
||||
| 256 | 64 | all servers | 45332 | 1847 | 2.48| 0.12 |
|
||||
| 256 | 256 | all servers | 46485 | 1340 | 10.18 | 0.74 |
|
||||
|
||||
### Single Key Write Performance
|
||||
|
||||
| key size in bytes | number of clients | target etcd server | average write QPS | write QPS stddev | average 90th Percentile Latency (ms) | latency stddev |
|
||||
|-------------------|-------------------|--------------------|------------------|-----------------|--------------------------------------|----------------|
|
||||
| 64 | 1 | leader only | 55 | 4 | 24.51 | 13.26 |
|
||||
| 64 | 64 | leader only | 2139 | 125 | 35.23 | 3.40 |
|
||||
| 64 | 256 | leader only | 4581 | 581 | 70.53 | 10.22 |
|
||||
| 256 | 1 | leader only | 56 | 4 | 22.37| 4.33 |
|
||||
| 256 | 64 | leader only | 2052 | 151 | 36.83 | 4.20 |
|
||||
| 256 | 256 | leader only | 4442 | 560 | 71.59 | 10.03 |
|
||||
| 64 | 64 | all servers | 1625 | 85 | 58.51 | 5.14 |
|
||||
| 64 | 256 | all servers | 4461 | 298 | 89.47 | 36.48 |
|
||||
| 256 | 64 | all servers | 1599 | 94 | 60.11| 6.43 |
|
||||
| 256 | 256 | all servers | 4315 | 193 | 88.98 | 7.01 |
|
||||
|
||||
## Performance Changes
|
||||
|
||||
- Because etcd now records metrics for each API call, read QPS performance seems to see a minor decrease in most scenarios. This minimal performance impact was judged a reasonable investment for the breadth of monitoring and debugging information returned.
|
||||
|
||||
- Write QPS to cluster leaders seems to be increased by a small margin. This is because the main loop and entry apply loops were decoupled in the etcd raft logic, eliminating several blocks between them.
|
||||
|
||||
- Write QPS to all members seems to be increased by a significant margin, because followers now receive the latest commit index sooner, and commit proposals more quickly.
|
@ -20,11 +20,11 @@ Go Version: go1.4.2
|
||||
Go OS/Arch: linux/amd64
|
||||
```
|
||||
|
||||
Also, we use 3 etcd 2.1.0 alpha-stage members to form cluster to get base performance. etcd's commit head is at [c7146bd5][c7146bd5], which is the same as the one that we use in [etcd 2.1 benchmark][etcd-2.1-benchmark].
|
||||
Also, we use 3 etcd 2.1.0 alpha-stage members to form cluster to get base performance. etcd's commit head is at [c7146bd5](https://github.com/coreos/etcd/commits/c7146bd5f2c73716091262edc638401bb8229144), which is the same as the one that we use in [etcd 2.1 benchmark](./etcd-2-1-0-benchmarks.md).
|
||||
|
||||
## Testing
|
||||
|
||||
Bootstrap another machine and use the [boom HTTP benchmark tool][boom] to send requests to each etcd member. Check the [benchmark hacking guide][hack-benchmark] for detailed instructions.
|
||||
Bootstrap another machine and use benchmark tool [boom](https://github.com/rakyll/boom) to send requests to each etcd member. Check [here](../../hack/benchmark/) for instructions.
|
||||
|
||||
## Performance
|
||||
|
||||
@ -65,8 +65,3 @@ Bootstrap another machine and use the [boom HTTP benchmark tool][boom] to send r
|
||||
- write QPS to leader is increased by 20~30%. This is because we decouple raft main loop and entry apply loop, which avoids them blocking each other.
|
||||
|
||||
- write QPS to all servers is increased by 30~80% because follower could receive latest commit index earlier and commit proposals faster.
|
||||
|
||||
[boom]: https://github.com/rakyll/boom
|
||||
[c7146bd5]: https://github.com/coreos/etcd/commits/c7146bd5f2c73716091262edc638401bb8229144
|
||||
[etcd-2.1-benchmark]: etcd-2-1-0-alpha-benchmarks.md
|
||||
[hack-benchmark]: /hack/benchmark/
|
||||
|
@ -14,7 +14,7 @@ GCE n1-highcpu-2 machine type
|
||||
|
||||
## Testing
|
||||
|
||||
Use [etcd v3 benchmark tool][etcd-v3-benchmark].
|
||||
Use [etcd v3 benchmark tool](../../hack/v3benchmark/).
|
||||
|
||||
## Performance
|
||||
|
||||
@ -38,5 +38,3 @@ The performance is nearly the same as the one with empty server handler.
|
||||
|
||||
The performance with empty server handler is not affected by one put. So the
|
||||
performance downgrade should be caused by storage package.
|
||||
|
||||
[etcd-v3-benchmark]: /tools/benchmark/
|
||||
|
@ -1,77 +0,0 @@
|
||||
# Watch Memory Usage Benchmark
|
||||
|
||||
*NOTE*: The watch features are under active development, and their memory usage may change as that development progresses. We do not expect it to significantly increase beyond the figures stated below.
|
||||
|
||||
A primary goal of etcd is supporting a very large number of watchers doing a massively large amount of watching. etcd aims to support O(10k) clients, O(100K) watch streams (O(10) streams per client) and O(10M) total watchings (O(100) watching per stream). The memory consumed by each individual watching accounts for the largest portion of etcd's overall usage, and is therefore the focus of current and future optimizations.
|
||||
|
||||
|
||||
Three related components of etcd watch consume physical memory: each `grpc.Conn`, each watch stream, and each instance of the watching activity. `grpc.Conn` maintains the actual TCP connection and other gRPC connection state. Each `grpc.Conn` consumes O(10kb) of memory, and might have multiple watch streams attached.
|
||||
|
||||
Each watch stream is an independent HTTP2 connection which consumes another O(10kb) of memory.
|
||||
Multiple watchings might share one watch stream.
|
||||
|
||||
Watching is the actual struct that tracks the changes on the key-value store. Each watching should only consume < O(1kb).
|
||||
|
||||
```
|
||||
+-------+
|
||||
| watch |
|
||||
+---------> | foo |
|
||||
| +-------+
|
||||
+------+-----+
|
||||
| stream |
|
||||
+--------------> | |
|
||||
| +------+-----+ +-------+
|
||||
| | | watch |
|
||||
| +---------> | bar |
|
||||
+-----+------+ +-------+
|
||||
| | +------------+
|
||||
| conn +-------> | stream |
|
||||
| | | |
|
||||
+-----+------+ +------------+
|
||||
|
|
||||
|
|
||||
|
|
||||
| +------------+
|
||||
+--------------> | stream |
|
||||
| |
|
||||
+------------+
|
||||
```
|
||||
|
||||
The theoretical memory consumption of watch can be approximated with the formula:
|
||||
`memory = c1 * number_of_conn + c2 * avg_number_of_stream_per_conn + c3 * avg_number_of_watch_stream`
|
||||
|
||||
## Testing Environment
|
||||
|
||||
etcd version
|
||||
- git head https://github.com/coreos/etcd/commit/185097ffaa627b909007e772c175e8fefac17af3
|
||||
|
||||
GCE n1-standard-2 machine type
|
||||
- 7.5 GB memory
|
||||
- 2x CPUs
|
||||
|
||||
## Overall memory usage
|
||||
|
||||
The overall memory usage captures how much [RSS][rss] etcd consumes with the client watchers. While the result may vary by as much as 10%, it is still meaningful, since the goal is to learn about the rough memory usage and the pattern of allocations.
|
||||
|
||||
With the benchmark result, we can calculate roughly that `c1 = 17kb`, `c2 = 18kb` and `c3 = 350bytes`. So each additional client connection consumes 17kb of memory and each additional stream consumes 18kb of memory, and each additional watching only cause 350bytes. A single etcd server can maintain millions of watchings with a few GB of memory in normal case.
|
||||
|
||||
|
||||
| clients | streams per client | watchings per stream | total watching | memory usage |
|
||||
|---------|---------|-----------|----------------|--------------|
|
||||
| 1k | 1 | 1 | 1k | 50MB |
|
||||
| 2k | 1 | 1 | 2k | 90MB |
|
||||
| 5k | 1 | 1 | 5k | 200MB |
|
||||
| 1k | 10 | 1 | 10k | 217MB |
|
||||
| 2k | 10 | 1 | 20k | 417MB |
|
||||
| 5k | 10 | 1 | 50k | 980MB |
|
||||
| 1k | 50 | 1 | 50k | 1001MB |
|
||||
| 2k | 50 | 1 | 100k | 1960MB |
|
||||
| 5k | 50 | 1 | 250k | 4700MB |
|
||||
| 1k | 50 | 10 | 500k | 1171MB |
|
||||
| 2k | 50 | 10 | 1M | 2371MB |
|
||||
| 5k | 50 | 10 | 2.5M | 5710MB |
|
||||
| 1k | 50 | 100 | 5M | 2380MB |
|
||||
| 2k | 50 | 100 | 10M | 4672MB |
|
||||
| 5k | 50 | 100 | 50M | *OOM* |
|
||||
|
||||
[rss]: https://en.wikipedia.org/wiki/Resident_set_size
|
@ -1,98 +0,0 @@
|
||||
# Storage Memory Usage Benchmark
|
||||
|
||||
<!---todo: link storage to storage design doc-->
|
||||
Two components of etcd storage consume physical memory. The etcd process allocates an *in-memory index* to speed key lookup. The process's *page cache*, managed by the operating system, stores recently-accessed data from disk for quick re-use.
|
||||
|
||||
The in-memory index holds all the keys in a [B-tree][btree] data structure, along with pointers to the on-disk data (the values). Each key in the B-tree may contain multiple pointers, pointing to different versions of its values. The theoretical memory consumption of the in-memory index can hence be approximated with the formula:
|
||||
|
||||
`N * (c1 + avg_key_size) + N * (avg_versions_of_key) * (c2 + size_of_pointer)`
|
||||
|
||||
where `c1` is the key metadata overhead and `c2` is the version metadata overhead.
|
||||
|
||||
The graph shows the detailed structure of the in-memory index B-tree.
|
||||
|
||||
```
|
||||
|
||||
|
||||
In mem index
|
||||
|
||||
+------------+
|
||||
| key || ... |
|
||||
+--------------+ | || |
|
||||
| | +------------+
|
||||
| | | v1 || ... |
|
||||
| disk <----------------| || | Tree Node
|
||||
| | +------------+
|
||||
| | | v2 || ... |
|
||||
| <----------------+ || |
|
||||
| | +------------+
|
||||
+--------------+ +-----+ | | |
|
||||
| | | | |
|
||||
| +------------+
|
||||
|
|
||||
|
|
||||
^
|
||||
------+
|
||||
| ... |
|
||||
| |
|
||||
+-----+
|
||||
| ... | Tree Node
|
||||
| |
|
||||
+-----+
|
||||
| ... |
|
||||
| |
|
||||
------+
|
||||
```
|
||||
|
||||
[Page cache memory][pagecache] is managed by the operating system and is not covered in detail in this document.
|
||||
|
||||
## Testing Environment
|
||||
|
||||
etcd version
|
||||
- git head https://github.com/coreos/etcd/commit/776e9fb7be7eee5e6b58ab977c8887b4fe4d48db
|
||||
|
||||
GCE n1-standard-2 machine type
|
||||
|
||||
- 7.5 GB memory
|
||||
- 2x CPUs
|
||||
|
||||
## In-memory index memory usage
|
||||
|
||||
In this test, we only benchmark the memory usage of the in-memory index. The goal is to find `c1` and `c2` mentioned above and to understand the hard limit of memory consumption of the storage.
|
||||
|
||||
We calculate the memory usage consumption via the Go runtime.ReadMemStats. We calculate the total allocated bytes difference before creating the index and after creating the index. It cannot perfectly reflect the memory usage of the in-memory index itself but can show the rough consumption pattern.
|
||||
|
||||
| N | versions | key size | memory usage |
|
||||
|------|----------|----------|--------------|
|
||||
| 100K | 1 | 64bytes | 22MB |
|
||||
| 100K | 5 | 64bytes | 39MB |
|
||||
| 1M | 1 | 64bytes | 218MB |
|
||||
| 1M | 5 | 64bytes | 432MB |
|
||||
| 100K | 1 | 256bytes | 41MB |
|
||||
| 100K | 5 | 256bytes | 65MB |
|
||||
| 1M | 1 | 256bytes | 409MB |
|
||||
| 1M | 5 | 256bytes | 506MB |
|
||||
|
||||
|
||||
Based on the result, we can calculate `c1=120bytes`, `c2=30bytes`. We only need two sets of data to calculate `c1` and `c2`, since they are the only unknown variable in the formula. The `c1=120bytes` and `c2=30bytes` are the average value of the 4 sets of `c1` and `c2` we calculated. The key metadata overhead is still relatively nontrivial (50%) for small key-value pairs. However, this is a significant improvement over the old store, which had at least 1000% overhead.
|
||||
|
||||
## Overall memory usage
|
||||
|
||||
The overall memory usage captures how much RSS etcd consumes with the storage. The value size should have very little impact on the overall memory usage of etcd, since we keep values on disk and only retain hot values in memory, managed by the OS page cache.
|
||||
|
||||
| N | versions | key size | value size | memory usage |
|
||||
|------|----------|----------|------------|--------------|
|
||||
| 100K | 1 | 64bytes | 256bytes | 40MB |
|
||||
| 100K | 5 | 64bytes | 256bytes | 89MB |
|
||||
| 1M | 1 | 64bytes | 256bytes | 470MB |
|
||||
| 1M | 5 | 64bytes | 256bytes | 880MB |
|
||||
| 100K | 1 | 64bytes | 1KB | 102MB |
|
||||
| 100K | 5 | 64bytes | 1KB | 164MB |
|
||||
| 1M | 1 | 64bytes | 1KB | 587MB |
|
||||
| 1M | 5 | 64bytes | 1KB | 836MB |
|
||||
|
||||
Based on the result, we know the value size does not significantly impact the memory consumption. There is some minor increase due to more data held in the OS page cache.
|
||||
|
||||
[btree]: https://en.wikipedia.org/wiki/B-tree
|
||||
[pagecache]: https://en.wikipedia.org/wiki/Page_cache
|
||||
|
@ -1,13 +1,13 @@
|
||||
# Branch Management
|
||||
## Branch Management
|
||||
|
||||
## Guide
|
||||
### Guide
|
||||
|
||||
* New development occurs on the [master branch][master].
|
||||
* Master branch should always have a green build!
|
||||
* Backwards-compatible bug fixes should target the master branch and subsequently be ported to stable branches.
|
||||
* Once the master branch is ready for release, it will be tagged and become the new stable branch.
|
||||
- New development occurs on the [master branch](https://github.com/coreos/etcd/tree/master)
|
||||
- Master branch should always have a green build!
|
||||
- Backwards-compatible bug fixes should target the master branch and subsequently be ported to stable branches
|
||||
- Once the master branch is ready for release, it will be tagged and become the new stable branch.
|
||||
|
||||
The etcd team has adopted a *rolling release model* and supports one stable version of etcd.
|
||||
The etcd team has adopted a _rolling release model_ and supports one stable version of etcd.
|
||||
|
||||
### Master branch
|
||||
|
||||
@ -22,5 +22,3 @@ Before the release of the next stable version, feature PRs will be frozen. We wi
|
||||
All branches with prefix `release-` are considered _stable_ branches.
|
||||
|
||||
After every minor release (http://semver.org/), we will have a new stable branch for that release. We will keep fixing the backwards-compatible bugs for the latest stable release, but not previous releases. The _patch_ release, incorporating any bug fixes, will be once every two weeks, given any patches.
|
||||
|
||||
[master]: https://github.com/coreos/etcd/tree/master
|
||||
|
@ -4,7 +4,7 @@
|
||||
|
||||
Starting an etcd cluster statically requires that each member knows another in the cluster. In a number of cases, you might not know the IPs of your cluster members ahead of time. In these cases, you can bootstrap an etcd cluster with the help of a discovery service.
|
||||
|
||||
Once an etcd cluster is up and running, adding or removing members is done via [runtime reconfiguration][runtime-conf]. To better understand the design behind runtime reconfiguration, we suggest you read [the runtime configuration design document][runtime-reconf-design].
|
||||
Once an etcd cluster is up and running, adding or removing members is done via [runtime reconfiguration](runtime-configuration.md). To better understand the design behind runtime reconfiguration, we suggest you read [this](runtime-reconf-design.md).
|
||||
|
||||
This guide will cover the following mechanisms for bootstrapping an etcd cluster:
|
||||
|
||||
@ -30,59 +30,59 @@ ETCD_INITIAL_CLUSTER_STATE=new
|
||||
```
|
||||
|
||||
```
|
||||
--initial-cluster infra0=http://10.0.1.10:2380,infra1=http://10.0.1.11:2380,infra2=http://10.0.1.12:2380 \
|
||||
--initial-cluster-state new
|
||||
-initial-cluster infra0=http://10.0.1.10:2380,infra1=http://10.0.1.11:2380,infra2=http://10.0.1.12:2380 \
|
||||
-initial-cluster-state new
|
||||
```
|
||||
|
||||
Note that the URLs specified in `initial-cluster` are the _advertised peer URLs_, i.e. they should match the value of `initial-advertise-peer-urls` on the respective nodes.
|
||||
|
||||
If you are spinning up multiple clusters (or creating and destroying a single cluster) with same configuration for testing purpose, it is highly recommended that you specify a unique `initial-cluster-token` for the different clusters. By doing this, etcd can generate unique cluster IDs and member IDs for the clusters even if they otherwise have the exact same configuration. This can protect you from cross-cluster-interaction, which might corrupt your clusters.
|
||||
|
||||
etcd listens on [`listen-client-urls`][conf-listen-client] to accept client traffic. etcd member advertises the URLs specified in [`advertise-client-urls`][conf-adv-client] to other members, proxies, clients. Please make sure the `advertise-client-urls` are reachable from intended clients. A common mistake is setting `advertise-client-urls` to localhost or leave it as default when you want the remote clients to reach etcd.
|
||||
etcd listens on [`listen-client-urls`](configuration.md#-listen-client-urls) to accept client traffic. etcd member advertises the URLs specified in [`advertise-client-urls`](configuration.md#-advertise-client-urls) to other members, proxies, clients. Please make sure the `advertise-client-urls` are reachable from intended clients. A common mistake is setting `advertise-client-urls` to localhost or leave it as default when you want the remote clients to reach etcd.
|
||||
|
||||
On each machine you would start etcd with these flags:
|
||||
|
||||
```
|
||||
$ etcd --name infra0 --initial-advertise-peer-urls http://10.0.1.10:2380 \
|
||||
--listen-peer-urls http://10.0.1.10:2380 \
|
||||
--listen-client-urls http://10.0.1.10:2379,http://127.0.0.1:2379 \
|
||||
--advertise-client-urls http://10.0.1.10:2379 \
|
||||
--initial-cluster-token etcd-cluster-1 \
|
||||
--initial-cluster infra0=http://10.0.1.10:2380,infra1=http://10.0.1.11:2380,infra2=http://10.0.1.12:2380 \
|
||||
--initial-cluster-state new
|
||||
$ etcd -name infra0 -initial-advertise-peer-urls http://10.0.1.10:2380 \
|
||||
-listen-peer-urls http://10.0.1.10:2380 \
|
||||
-listen-client-urls http://10.0.1.10:2379,http://127.0.0.1:2379 \
|
||||
-advertise-client-urls http://10.0.1.10:2379 \
|
||||
-initial-cluster-token etcd-cluster-1 \
|
||||
-initial-cluster infra0=http://10.0.1.10:2380,infra1=http://10.0.1.11:2380,infra2=http://10.0.1.12:2380 \
|
||||
-initial-cluster-state new
|
||||
```
|
||||
```
|
||||
$ etcd --name infra1 --initial-advertise-peer-urls http://10.0.1.11:2380 \
|
||||
--listen-peer-urls http://10.0.1.11:2380 \
|
||||
--listen-client-urls http://10.0.1.11:2379,http://127.0.0.1:2379 \
|
||||
--advertise-client-urls http://10.0.1.11:2379 \
|
||||
--initial-cluster-token etcd-cluster-1 \
|
||||
--initial-cluster infra0=http://10.0.1.10:2380,infra1=http://10.0.1.11:2380,infra2=http://10.0.1.12:2380 \
|
||||
--initial-cluster-state new
|
||||
$ etcd -name infra1 -initial-advertise-peer-urls http://10.0.1.11:2380 \
|
||||
-listen-peer-urls http://10.0.1.11:2380 \
|
||||
-listen-client-urls http://10.0.1.11:2379,http://127.0.0.1:2379 \
|
||||
-advertise-client-urls http://10.0.1.11:2379 \
|
||||
-initial-cluster-token etcd-cluster-1 \
|
||||
-initial-cluster infra0=http://10.0.1.10:2380,infra1=http://10.0.1.11:2380,infra2=http://10.0.1.12:2380 \
|
||||
-initial-cluster-state new
|
||||
```
|
||||
```
|
||||
$ etcd --name infra2 --initial-advertise-peer-urls http://10.0.1.12:2380 \
|
||||
--listen-peer-urls http://10.0.1.12:2380 \
|
||||
--listen-client-urls http://10.0.1.12:2379,http://127.0.0.1:2379 \
|
||||
--advertise-client-urls http://10.0.1.12:2379 \
|
||||
--initial-cluster-token etcd-cluster-1 \
|
||||
--initial-cluster infra0=http://10.0.1.10:2380,infra1=http://10.0.1.11:2380,infra2=http://10.0.1.12:2380 \
|
||||
--initial-cluster-state new
|
||||
$ etcd -name infra2 -initial-advertise-peer-urls http://10.0.1.12:2380 \
|
||||
-listen-peer-urls http://10.0.1.12:2380 \
|
||||
-listen-client-urls http://10.0.1.12:2379,http://127.0.0.1:2379 \
|
||||
-advertise-client-urls http://10.0.1.12:2379 \
|
||||
-initial-cluster-token etcd-cluster-1 \
|
||||
-initial-cluster infra0=http://10.0.1.10:2380,infra1=http://10.0.1.11:2380,infra2=http://10.0.1.12:2380 \
|
||||
-initial-cluster-state new
|
||||
```
|
||||
|
||||
The command line parameters starting with `--initial-cluster` will be ignored on subsequent runs of etcd. You are free to remove the environment variables or command line flags after the initial bootstrap process. If you need to make changes to the configuration later (for example, adding or removing members to/from the cluster), see the [runtime configuration][runtime-conf] guide.
|
||||
The command line parameters starting with `-initial-cluster` will be ignored on subsequent runs of etcd. You are free to remove the environment variables or command line flags after the initial bootstrap process. If you need to make changes to the configuration later (for example, adding or removing members to/from the cluster), see the [runtime configuration](runtime-configuration.md) guide.
|
||||
|
||||
### Error Cases
|
||||
|
||||
In the following example, we have not included our new host in the list of enumerated nodes. If this is a new cluster, the node _must_ be added to the list of initial cluster members.
|
||||
|
||||
```
|
||||
$ etcd --name infra1 --initial-advertise-peer-urls http://10.0.1.11:2380 \
|
||||
--listen-peer-urls https://10.0.1.11:2380 \
|
||||
--listen-client-urls http://10.0.1.11:2379,http://127.0.0.1:2379 \
|
||||
--advertise-client-urls http://10.0.1.11:2379 \
|
||||
--initial-cluster infra0=http://10.0.1.10:2380 \
|
||||
--initial-cluster-state new
|
||||
$ etcd -name infra1 -initial-advertise-peer-urls http://10.0.1.11:2380 \
|
||||
-listen-peer-urls https://10.0.1.11:2380 \
|
||||
-listen-client-urls http://10.0.1.11:2379,http://127.0.0.1:2379 \
|
||||
-advertise-client-urls http://10.0.1.11:2379 \
|
||||
-initial-cluster infra0=http://10.0.1.10:2380 \
|
||||
-initial-cluster-state new
|
||||
etcd: infra1 not listed in the initial cluster config
|
||||
exit 1
|
||||
```
|
||||
@ -90,12 +90,12 @@ exit 1
|
||||
In this example, we are attempting to map a node (infra0) on a different address (127.0.0.1:2380) than its enumerated address in the cluster list (10.0.1.10:2380). If this node is to listen on multiple addresses, all addresses _must_ be reflected in the "initial-cluster" configuration directive.
|
||||
|
||||
```
|
||||
$ etcd --name infra0 --initial-advertise-peer-urls http://127.0.0.1:2380 \
|
||||
--listen-peer-urls http://10.0.1.10:2380 \
|
||||
--listen-client-urls http://10.0.1.10:2379,http://127.0.0.1:2379 \
|
||||
--advertise-client-urls http://10.0.1.10:2379 \
|
||||
--initial-cluster infra0=http://10.0.1.10:2380,infra1=http://10.0.1.11:2380,infra2=http://10.0.1.12:2380 \
|
||||
--initial-cluster-state=new
|
||||
$ etcd -name infra0 -initial-advertise-peer-urls http://127.0.0.1:2380 \
|
||||
-listen-peer-urls http://10.0.1.10:2380 \
|
||||
-listen-client-urls http://10.0.1.10:2379,http://127.0.0.1:2379 \
|
||||
-advertise-client-urls http://10.0.1.10:2379 \
|
||||
-initial-cluster infra0=http://10.0.1.10:2380,infra1=http://10.0.1.11:2380,infra2=http://10.0.1.12:2380 \
|
||||
-initial-cluster-state=new
|
||||
etcd: error setting up initial cluster: infra0 has different advertised URLs in the cluster and advertised peer URLs list
|
||||
exit 1
|
||||
```
|
||||
@ -103,12 +103,12 @@ exit 1
|
||||
If you configure a peer with a different set of configuration and attempt to join this cluster you will get a cluster ID mismatch and etcd will exit.
|
||||
|
||||
```
|
||||
$ etcd --name infra3 --initial-advertise-peer-urls http://10.0.1.13:2380 \
|
||||
--listen-peer-urls http://10.0.1.13:2380 \
|
||||
--listen-client-urls http://10.0.1.13:2379,http://127.0.0.1:2379 \
|
||||
--advertise-client-urls http://10.0.1.13:2379 \
|
||||
--initial-cluster infra0=http://10.0.1.10:2380,infra1=http://10.0.1.11:2380,infra3=http://10.0.1.13:2380 \
|
||||
--initial-cluster-state=new
|
||||
$ etcd -name infra3 -initial-advertise-peer-urls http://10.0.1.13:2380 \
|
||||
-listen-peer-urls http://10.0.1.13:2380 \
|
||||
-listen-client-urls http://10.0.1.13:2379,http://127.0.0.1:2379 \
|
||||
-advertise-client-urls http://10.0.1.13:2379 \
|
||||
-initial-cluster infra0=http://10.0.1.10:2380,infra1=http://10.0.1.11:2380,infra3=http://10.0.1.13:2380 \
|
||||
-initial-cluster-state=new
|
||||
etcd: conflicting cluster ID to the target cluster (c6ab534d07e8fcc4 != bc25ea2a74fb18b0). Exiting.
|
||||
exit 1
|
||||
```
|
||||
@ -124,13 +124,15 @@ There two methods that can be used for discovery:
|
||||
|
||||
### etcd Discovery
|
||||
|
||||
To better understand the design about discovery service protocol, we suggest you read [this][discovery-proto].
|
||||
To better understand the design about discovery service protocol, we suggest you read [this](./discovery_protocol.md).
|
||||
|
||||
#### Lifetime of a Discovery URL
|
||||
|
||||
A discovery URL identifies a unique etcd cluster. Instead of reusing a discovery URL, you should always create discovery URLs for new clusters.
|
||||
|
||||
Moreover, discovery URLs should ONLY be used for the initial bootstrapping of a cluster. To change cluster membership after the cluster is already running, see the [runtime reconfiguration][runtime-conf] guide.
|
||||
Moreover, discovery URLs should ONLY be used for the initial bootstrapping of a cluster. To change cluster membership after the cluster is already running, see the [runtime reconfiguration][runtime] guide.
|
||||
|
||||
[runtime]: runtime-configuration.md
|
||||
|
||||
#### Custom etcd Discovery Service
|
||||
|
||||
@ -146,30 +148,30 @@ If you bootstrap an etcd cluster using discovery service with more than the expe
|
||||
|
||||
The URL you will use in this case will be `https://myetcd.local/v2/keys/discovery/6c007a14875d53d9bf0ef5a6fc0257c817f0fb83` and the etcd members will use the `https://myetcd.local/v2/keys/discovery/6c007a14875d53d9bf0ef5a6fc0257c817f0fb83` directory for registration as they start.
|
||||
|
||||
**Each member must have a different name flag specified. `Hostname` or `machine-id` can be a good choice. Or discovery will fail due to duplicated name.**
|
||||
Each member must have a different name flag specified. Or discovery will fail due to duplicated name.
|
||||
|
||||
Now we start etcd with those relevant flags for each member:
|
||||
|
||||
```
|
||||
$ etcd --name infra0 --initial-advertise-peer-urls http://10.0.1.10:2380 \
|
||||
--listen-peer-urls http://10.0.1.10:2380 \
|
||||
--listen-client-urls http://10.0.1.10:2379,http://127.0.0.1:2379 \
|
||||
--advertise-client-urls http://10.0.1.10:2379 \
|
||||
--discovery https://myetcd.local/v2/keys/discovery/6c007a14875d53d9bf0ef5a6fc0257c817f0fb83
|
||||
$ etcd -name infra0 -initial-advertise-peer-urls http://10.0.1.10:2380 \
|
||||
-listen-peer-urls http://10.0.1.10:2380 \
|
||||
-listen-client-urls http://10.0.1.10:2379,http://127.0.0.1:2379 \
|
||||
-advertise-client-urls http://10.0.1.10:2379 \
|
||||
-discovery https://myetcd.local/v2/keys/discovery/6c007a14875d53d9bf0ef5a6fc0257c817f0fb83
|
||||
```
|
||||
```
|
||||
$ etcd --name infra1 --initial-advertise-peer-urls http://10.0.1.11:2380 \
|
||||
--listen-peer-urls http://10.0.1.11:2380 \
|
||||
--listen-client-urls http://10.0.1.11:2379,http://127.0.0.1:2379 \
|
||||
--advertise-client-urls http://10.0.1.11:2379 \
|
||||
--discovery https://myetcd.local/v2/keys/discovery/6c007a14875d53d9bf0ef5a6fc0257c817f0fb83
|
||||
$ etcd -name infra1 -initial-advertise-peer-urls http://10.0.1.11:2380 \
|
||||
-listen-peer-urls http://10.0.1.11:2380 \
|
||||
-listen-client-urls http://10.0.1.11:2379,http://127.0.0.1:2379 \
|
||||
-advertise-client-urls http://10.0.1.11:2379 \
|
||||
-discovery https://myetcd.local/v2/keys/discovery/6c007a14875d53d9bf0ef5a6fc0257c817f0fb83
|
||||
```
|
||||
```
|
||||
$ etcd --name infra2 --initial-advertise-peer-urls http://10.0.1.12:2380 \
|
||||
--listen-peer-urls http://10.0.1.12:2380 \
|
||||
--listen-client-urls http://10.0.1.12:2379,http://127.0.0.1:2379 \
|
||||
--advertise-client-urls http://10.0.1.12:2379 \
|
||||
--discovery https://myetcd.local/v2/keys/discovery/6c007a14875d53d9bf0ef5a6fc0257c817f0fb83
|
||||
$ etcd -name infra2 -initial-advertise-peer-urls http://10.0.1.12:2380 \
|
||||
-listen-peer-urls http://10.0.1.12:2380 \
|
||||
-listen-client-urls http://10.0.1.12:2379,http://127.0.0.1:2379 \
|
||||
-advertise-client-urls http://10.0.1.12:2379 \
|
||||
-discovery https://myetcd.local/v2/keys/discovery/6c007a14875d53d9bf0ef5a6fc0257c817f0fb83
|
||||
```
|
||||
|
||||
This will cause each member to register itself with the custom etcd discovery service and begin the cluster once all machines have been registered.
|
||||
@ -187,6 +189,9 @@ This will create the cluster with an initial expected size of 3 members. If you
|
||||
|
||||
If you bootstrap an etcd cluster using discovery service with more than the expected number of etcd members, the extra etcd processes will [fall back][fall-back] to being [proxies][proxy] by default.
|
||||
|
||||
[fall-back]: proxy.md#fallback-to-proxy-mode-with-discovery-service
|
||||
[proxy]: proxy.md
|
||||
|
||||
```
|
||||
ETCD_DISCOVERY=https://discovery.etcd.io/3e86b59982e49066c5d813af1c2e2579cbf573de
|
||||
```
|
||||
@ -195,30 +200,30 @@ ETCD_DISCOVERY=https://discovery.etcd.io/3e86b59982e49066c5d813af1c2e2579cbf573d
|
||||
-discovery https://discovery.etcd.io/3e86b59982e49066c5d813af1c2e2579cbf573de
|
||||
```
|
||||
|
||||
**Each member must have a different name flag specified. `Hostname` or `machine-id` can be a good choice. Or discovery will fail due to duplicated name.**
|
||||
Each member must have a different name flag specified. Or discovery will fail due to duplicated name.
|
||||
|
||||
Now we start etcd with those relevant flags for each member:
|
||||
|
||||
```
|
||||
$ etcd --name infra0 --initial-advertise-peer-urls http://10.0.1.10:2380 \
|
||||
--listen-peer-urls http://10.0.1.10:2380 \
|
||||
--listen-client-urls http://10.0.1.10:2379,http://127.0.0.1:2379 \
|
||||
--advertise-client-urls http://10.0.1.10:2379 \
|
||||
--discovery https://discovery.etcd.io/3e86b59982e49066c5d813af1c2e2579cbf573de
|
||||
$ etcd -name infra0 -initial-advertise-peer-urls http://10.0.1.10:2380 \
|
||||
-listen-peer-urls http://10.0.1.10:2380 \
|
||||
-listen-client-urls http://10.0.1.10:2379,http://127.0.0.1:2379 \
|
||||
-advertise-client-urls http://10.0.1.10:2379 \
|
||||
-discovery https://discovery.etcd.io/3e86b59982e49066c5d813af1c2e2579cbf573de
|
||||
```
|
||||
```
|
||||
$ etcd --name infra1 --initial-advertise-peer-urls http://10.0.1.11:2380 \
|
||||
--listen-peer-urls http://10.0.1.11:2380 \
|
||||
--listen-client-urls http://10.0.1.11:2379,http://127.0.0.1:2379 \
|
||||
--advertise-client-urls http://10.0.1.11:2379 \
|
||||
--discovery https://discovery.etcd.io/3e86b59982e49066c5d813af1c2e2579cbf573de
|
||||
$ etcd -name infra1 -initial-advertise-peer-urls http://10.0.1.11:2380 \
|
||||
-listen-peer-urls http://10.0.1.11:2380 \
|
||||
-listen-client-urls http://10.0.1.11:2379,http://127.0.0.1:2379 \
|
||||
-advertise-client-urls http://10.0.1.11:2379 \
|
||||
-discovery https://discovery.etcd.io/3e86b59982e49066c5d813af1c2e2579cbf573de
|
||||
```
|
||||
```
|
||||
$ etcd --name infra2 --initial-advertise-peer-urls http://10.0.1.12:2380 \
|
||||
--listen-peer-urls http://10.0.1.12:2380 \
|
||||
--listen-client-urls http://10.0.1.12:2379,http://127.0.0.1:2379 \
|
||||
--advertise-client-urls http://10.0.1.12:2379 \
|
||||
--discovery https://discovery.etcd.io/3e86b59982e49066c5d813af1c2e2579cbf573de
|
||||
$ etcd -name infra2 -initial-advertise-peer-urls http://10.0.1.12:2380 \
|
||||
-listen-peer-urls http://10.0.1.12:2380 \
|
||||
-listen-client-urls http://10.0.1.12:2379,http://127.0.0.1:2379 \
|
||||
-advertise-client-urls http://10.0.1.12:2379 \
|
||||
-discovery https://discovery.etcd.io/3e86b59982e49066c5d813af1c2e2579cbf573de
|
||||
```
|
||||
|
||||
This will cause each member to register itself with the discovery service and begin the cluster once all members have been registered.
|
||||
@ -231,11 +236,11 @@ You can use the environment variable `ETCD_DISCOVERY_PROXY` to cause etcd to use
|
||||
|
||||
|
||||
```
|
||||
$ etcd --name infra0 --initial-advertise-peer-urls http://10.0.1.10:2380 \
|
||||
--listen-peer-urls http://10.0.1.10:2380 \
|
||||
--listen-client-urls http://10.0.1.10:2379,http://127.0.0.1:2379 \
|
||||
--advertise-client-urls http://10.0.1.10:2379 \
|
||||
--discovery https://discovery.etcd.io/3e86b59982e49066c5d813af1c2e2579cbf573de
|
||||
$ etcd -name infra0 -initial-advertise-peer-urls http://10.0.1.10:2380 \
|
||||
-listen-peer-urls http://10.0.1.10:2380 \
|
||||
-listen-client-urls http://10.0.1.10:2379,http://127.0.0.1:2379 \
|
||||
-advertise-client-urls http://10.0.1.10:2379 \
|
||||
-discovery https://discovery.etcd.io/3e86b59982e49066c5d813af1c2e2579cbf573de
|
||||
etcd: error: the cluster doesn’t have a size configuration value in https://discovery.etcd.io/3e86b59982e49066c5d813af1c2e2579cbf573de/_config
|
||||
exit 1
|
||||
```
|
||||
@ -245,12 +250,12 @@ exit 1
|
||||
This error will occur if the discovery cluster already has the configured number of members, and `discovery-fallback` is explicitly disabled
|
||||
|
||||
```
|
||||
$ etcd --name infra0 --initial-advertise-peer-urls http://10.0.1.10:2380 \
|
||||
--listen-peer-urls http://10.0.1.10:2380 \
|
||||
--listen-client-urls http://10.0.1.10:2379,http://127.0.0.1:2379 \
|
||||
--advertise-client-urls http://10.0.1.10:2379 \
|
||||
--discovery https://discovery.etcd.io/3e86b59982e49066c5d813af1c2e2579cbf573de \
|
||||
--discovery-fallback exit
|
||||
$ etcd -name infra0 -initial-advertise-peer-urls http://10.0.1.10:2380 \
|
||||
-listen-peer-urls http://10.0.1.10:2380 \
|
||||
-listen-client-urls http://10.0.1.10:2379,http://127.0.0.1:2379 \
|
||||
-advertise-client-urls http://10.0.1.10:2379 \
|
||||
-discovery https://discovery.etcd.io/3e86b59982e49066c5d813af1c2e2579cbf573de \
|
||||
-discovery-fallback exit
|
||||
etcd: discovery: cluster is full
|
||||
exit 1
|
||||
```
|
||||
@ -261,17 +266,17 @@ This is a harmless warning notifying you that the discovery URL will be
|
||||
ignored on this machine.
|
||||
|
||||
```
|
||||
$ etcd --name infra0 --initial-advertise-peer-urls http://10.0.1.10:2380 \
|
||||
--listen-peer-urls http://10.0.1.10:2380 \
|
||||
--listen-client-urls http://10.0.1.10:2379,http://127.0.0.1:2379 \
|
||||
--advertise-client-urls http://10.0.1.10:2379 \
|
||||
--discovery https://discovery.etcd.io/3e86b59982e49066c5d813af1c2e2579cbf573de
|
||||
$ etcd -name infra0 -initial-advertise-peer-urls http://10.0.1.10:2380 \
|
||||
-listen-peer-urls http://10.0.1.10:2380 \
|
||||
-listen-client-urls http://10.0.1.10:2379,http://127.0.0.1:2379 \
|
||||
-advertise-client-urls http://10.0.1.10:2379 \
|
||||
-discovery https://discovery.etcd.io/3e86b59982e49066c5d813af1c2e2579cbf573de
|
||||
etcdserver: discovery token ignored since a cluster has already been initialized. Valid log found at /var/lib/etcd
|
||||
```
|
||||
|
||||
### DNS Discovery
|
||||
|
||||
DNS [SRV records][rfc-srv] can be used as a discovery mechanism.
|
||||
DNS [SRV records](http://www.ietf.org/rfc/rfc2052.txt) can be used as a discovery mechanism.
|
||||
The `-discovery-srv` flag can be used to set the DNS domain name where the discovery SRV records can be found.
|
||||
The following DNS SRV records are looked up in the listed order:
|
||||
|
||||
@ -280,107 +285,93 @@ The following DNS SRV records are looked up in the listed order:
|
||||
|
||||
If `_etcd-server-ssl._tcp.example.com` is found then etcd will attempt the bootstrapping process over SSL.
|
||||
|
||||
To help clients discover the etcd cluster, the following DNS SRV records are looked up in the listed order:
|
||||
|
||||
* _etcd-client._tcp.example.com
|
||||
* _etcd-client-ssl._tcp.example.com
|
||||
|
||||
If `_etcd-client-ssl._tcp.example.com` is found, clients will attempt to communicate with the etcd cluster over SSL.
|
||||
|
||||
#### Create DNS SRV records
|
||||
|
||||
```
|
||||
$ dig +noall +answer SRV _etcd-server._tcp.example.com
|
||||
_etcd-server._tcp.example.com. 300 IN SRV 0 0 2380 infra0.example.com.
|
||||
_etcd-server._tcp.example.com. 300 IN SRV 0 0 2380 infra1.example.com.
|
||||
_etcd-server._tcp.example.com. 300 IN SRV 0 0 2380 infra2.example.com.
|
||||
```
|
||||
|
||||
```
|
||||
$ dig +noall +answer SRV _etcd-client._tcp.example.com
|
||||
_etcd-client._tcp.example.com. 300 IN SRV 0 0 2379 infra0.example.com.
|
||||
_etcd-client._tcp.example.com. 300 IN SRV 0 0 2379 infra1.example.com.
|
||||
_etcd-client._tcp.example.com. 300 IN SRV 0 0 2379 infra2.example.com.
|
||||
_etcd-server._tcp.example.com. 300 IN SRV 0 0 2380 infra0.example.com.
|
||||
_etcd-server._tcp.example.com. 300 IN SRV 0 0 2380 infra1.example.com.
|
||||
_etcd-server._tcp.example.com. 300 IN SRV 0 0 2380 infra2.example.com.
|
||||
```
|
||||
|
||||
```
|
||||
$ dig +noall +answer infra0.example.com infra1.example.com infra2.example.com
|
||||
infra0.example.com. 300 IN A 10.0.1.10
|
||||
infra1.example.com. 300 IN A 10.0.1.11
|
||||
infra2.example.com. 300 IN A 10.0.1.12
|
||||
infra0.example.com. 300 IN A 10.0.1.10
|
||||
infra1.example.com. 300 IN A 10.0.1.11
|
||||
infra2.example.com. 300 IN A 10.0.1.12
|
||||
```
|
||||
#### Bootstrap the etcd cluster using DNS
|
||||
|
||||
etcd cluster members can listen on domain names or IP address, the bootstrap process will resolve DNS A records.
|
||||
|
||||
The resolved address in `--initial-advertise-peer-urls` *must match* one of the resolved addresses in the SRV targets. The etcd member reads the resolved address to find out if it belongs to the cluster defined in the SRV records.
|
||||
The resolved address in `-initial-advertise-peer-urls` *must match* one of the resolved addresses in the SRV targets. The etcd member reads the resolved address to find out if it belongs to the cluster defined in the SRV records.
|
||||
|
||||
```
|
||||
$ etcd --name infra0 \
|
||||
--discovery-srv example.com \
|
||||
--initial-advertise-peer-urls http://infra0.example.com:2380 \
|
||||
--initial-cluster-token etcd-cluster-1 \
|
||||
--initial-cluster-state new \
|
||||
--advertise-client-urls http://infra0.example.com:2379 \
|
||||
--listen-client-urls http://infra0.example.com:2379 \
|
||||
--listen-peer-urls http://infra0.example.com:2380
|
||||
$ etcd -name infra0 \
|
||||
-discovery-srv example.com \
|
||||
-initial-advertise-peer-urls http://infra0.example.com:2380 \
|
||||
-initial-cluster-token etcd-cluster-1 \
|
||||
-initial-cluster-state new \
|
||||
-advertise-client-urls http://infra0.example.com:2379 \
|
||||
-listen-client-urls http://infra0.example.com:2379 \
|
||||
-listen-peer-urls http://infra0.example.com:2380
|
||||
```
|
||||
|
||||
```
|
||||
$ etcd --name infra1 \
|
||||
--discovery-srv example.com \
|
||||
--initial-advertise-peer-urls http://infra1.example.com:2380 \
|
||||
--initial-cluster-token etcd-cluster-1 \
|
||||
--initial-cluster-state new \
|
||||
--advertise-client-urls http://infra1.example.com:2379 \
|
||||
--listen-client-urls http://infra1.example.com:2379 \
|
||||
--listen-peer-urls http://infra1.example.com:2380
|
||||
$ etcd -name infra1 \
|
||||
-discovery-srv example.com \
|
||||
-initial-advertise-peer-urls http://infra1.example.com:2380 \
|
||||
-initial-cluster-token etcd-cluster-1 \
|
||||
-initial-cluster-state new \
|
||||
-advertise-client-urls http://infra1.example.com:2379 \
|
||||
-listen-client-urls http://infra1.example.com:2379 \
|
||||
-listen-peer-urls http://infra1.example.com:2380
|
||||
```
|
||||
|
||||
```
|
||||
$ etcd --name infra2 \
|
||||
--discovery-srv example.com \
|
||||
--initial-advertise-peer-urls http://infra2.example.com:2380 \
|
||||
--initial-cluster-token etcd-cluster-1 \
|
||||
--initial-cluster-state new \
|
||||
--advertise-client-urls http://infra2.example.com:2379 \
|
||||
--listen-client-urls http://infra2.example.com:2379 \
|
||||
--listen-peer-urls http://infra2.example.com:2380
|
||||
$ etcd -name infra2 \
|
||||
-discovery-srv example.com \
|
||||
-initial-advertise-peer-urls http://infra2.example.com:2380 \
|
||||
-initial-cluster-token etcd-cluster-1 \
|
||||
-initial-cluster-state new \
|
||||
-advertise-client-urls http://infra2.example.com:2379 \
|
||||
-listen-client-urls http://infra2.example.com:2379 \
|
||||
-listen-peer-urls http://infra2.example.com:2380
|
||||
```
|
||||
|
||||
You can also bootstrap the cluster using IP addresses instead of domain names:
|
||||
|
||||
```
|
||||
$ etcd --name infra0 \
|
||||
--discovery-srv example.com \
|
||||
--initial-advertise-peer-urls http://10.0.1.10:2380 \
|
||||
--initial-cluster-token etcd-cluster-1 \
|
||||
--initial-cluster-state new \
|
||||
--advertise-client-urls http://10.0.1.10:2379 \
|
||||
--listen-client-urls http://10.0.1.10:2379 \
|
||||
--listen-peer-urls http://10.0.1.10:2380
|
||||
$ etcd -name infra0 \
|
||||
-discovery-srv example.com \
|
||||
-initial-advertise-peer-urls http://10.0.1.10:2380 \
|
||||
-initial-cluster-token etcd-cluster-1 \
|
||||
-initial-cluster-state new \
|
||||
-advertise-client-urls http://10.0.1.10:2379 \
|
||||
-listen-client-urls http://10.0.1.10:2379 \
|
||||
-listen-peer-urls http://10.0.1.10:2380
|
||||
```
|
||||
|
||||
```
|
||||
$ etcd --name infra1 \
|
||||
--discovery-srv example.com \
|
||||
--initial-advertise-peer-urls http://10.0.1.11:2380 \
|
||||
--initial-cluster-token etcd-cluster-1 \
|
||||
--initial-cluster-state new \
|
||||
--advertise-client-urls http://10.0.1.11:2379 \
|
||||
--listen-client-urls http://10.0.1.11:2379 \
|
||||
--listen-peer-urls http://10.0.1.11:2380
|
||||
$ etcd -name infra1 \
|
||||
-discovery-srv example.com \
|
||||
-initial-advertise-peer-urls http://10.0.1.11:2380 \
|
||||
-initial-cluster-token etcd-cluster-1 \
|
||||
-initial-cluster-state new \
|
||||
-advertise-client-urls http://10.0.1.11:2379 \
|
||||
-listen-client-urls http://10.0.1.11:2379 \
|
||||
-listen-peer-urls http://10.0.1.11:2380
|
||||
```
|
||||
|
||||
```
|
||||
$ etcd --name infra2 \
|
||||
--discovery-srv example.com \
|
||||
--initial-advertise-peer-urls http://10.0.1.12:2380 \
|
||||
--initial-cluster-token etcd-cluster-1 \
|
||||
--initial-cluster-state new \
|
||||
--advertise-client-urls http://10.0.1.12:2379 \
|
||||
--listen-client-urls http://10.0.1.12:2379 \
|
||||
--listen-peer-urls http://10.0.1.12:2380
|
||||
$ etcd -name infra2 \
|
||||
-discovery-srv example.com \
|
||||
-initial-advertise-peer-urls http://10.0.1.12:2380 \
|
||||
-initial-cluster-token etcd-cluster-1 \
|
||||
-initial-cluster-state new \
|
||||
-advertise-client-urls http://10.0.1.12:2379 \
|
||||
-listen-client-urls http://10.0.1.12:2379 \
|
||||
-listen-peer-urls http://10.0.1.12:2380
|
||||
```
|
||||
|
||||
#### etcd proxy configuration
|
||||
@ -388,24 +379,12 @@ $ etcd --name infra2 \
|
||||
DNS SRV records can also be used to configure the list of peers for an etcd server running in proxy mode:
|
||||
|
||||
```
|
||||
$ etcd --proxy on --discovery-srv example.com
|
||||
```
|
||||
|
||||
#### etcd client configuration
|
||||
|
||||
DNS SRV records can also be used to help clients discover the etcd cluster.
|
||||
|
||||
The official [etcd/client][client] supports [DNS Discovery][client-discoverer].
|
||||
|
||||
`etcdctl` also supports DNS Discovery by specifying the `--discovery-srv` option.
|
||||
|
||||
```
|
||||
$ etcdctl --discovery-srv example.com set foo bar
|
||||
$ etcd --proxy on -discovery-srv example.com
|
||||
```
|
||||
|
||||
#### Error Cases
|
||||
|
||||
You might see an error like `cannot find local etcd $name from SRV records.`. That means the etcd member fails to find itself from the cluster defined in SRV records. The resolved address in `--initial-advertise-peer-urls` *must match* one of the resolved addresses in the SRV targets.
|
||||
You might see the an error like `cannot find local etcd $name from SRV records.`. That means the etcd member fails to find itself from the cluster defined in SRV records. The resolved address in `-initial-advertise-peer-urls` *must match* one of the resolved addresses in the SRV targets.
|
||||
|
||||
# 0.4 to 2.0+ Migration Guide
|
||||
|
||||
@ -413,22 +392,11 @@ In etcd 2.0 we introduced the ability to listen on more than one address and to
|
||||
|
||||
To make understanding this feature easier, we changed the naming of some flags, but we support the old flags to make the migration from the old to new version easier.
|
||||
|
||||
|Old Flag |New Flag |Migration Behavior |
|
||||
|Old Flag |New Flag |Migration Behavior |
|
||||
|-----------------------|-----------------------|---------------------------------------------------------------------------------------|
|
||||
|-peer-addr |--initial-advertise-peer-urls |If specified, peer-addr will be used as the only peer URL. Error if both flags specified.|
|
||||
|-addr |--advertise-client-urls |If specified, addr will be used as the only client URL. Error if both flags specified.|
|
||||
|-peer-bind-addr |--listen-peer-urls |If specified, peer-bind-addr will be used as the only peer bind URL. Error if both flags specified.|
|
||||
|-bind-addr |--listen-client-urls |If specified, bind-addr will be used as the only client bind URL. Error if both flags specified.|
|
||||
|-peers |none |Deprecated. The --initial-cluster flag provides a similar concept with different semantics. Please read this guide on cluster startup.|
|
||||
|-peers-file |none |Deprecated. The --initial-cluster flag provides a similar concept with different semantics. Please read this guide on cluster startup.|
|
||||
|
||||
[client]: /client
|
||||
[client-discoverer]: https://godoc.org/github.com/coreos/etcd/client#Discoverer
|
||||
[conf-adv-client]: configuration.md#-advertise-client-urls
|
||||
[conf-listen-client]: configuration.md#-listen-client-urls
|
||||
[discovery-proto]: discovery_protocol.md
|
||||
[fall-back]: proxy.md#fallback-to-proxy-mode-with-discovery-service
|
||||
[proxy]: proxy.md
|
||||
[rfc-srv]: http://www.ietf.org/rfc/rfc2052.txt
|
||||
[runtime-conf]: runtime-configuration.md
|
||||
[runtime-reconf-design]: runtime-reconf-design.md
|
||||
|-peer-addr |-initial-advertise-peer-urls |If specified, peer-addr will be used as the only peer URL. Error if both flags specified.|
|
||||
|-addr |-advertise-client-urls |If specified, addr will be used as the only client URL. Error if both flags specified.|
|
||||
|-peer-bind-addr |-listen-peer-urls |If specified, peer-bind-addr will be used as the only peer bind URL. Error if both flags specified.|
|
||||
|-bind-addr |-listen-client-urls |If specified, bind-addr will be used as the only client bind URL. Error if both flags specified.|
|
||||
|-peers |none |Deprecated. The -initial-cluster flag provides a similar concept with different semantics. Please read this guide on cluster startup.|
|
||||
|-peers-file |none |Deprecated. The -initial-cluster flag provides a similar concept with different semantics. Please read this guide on cluster startup.|
|
||||
|
@ -1,282 +1,264 @@
|
||||
# Configuration Flags
|
||||
## Configuration Flags
|
||||
|
||||
etcd is configurable through command-line flags and environment variables. Options set on the command line take precedence over those from the environment.
|
||||
|
||||
The format of environment variable for flag `--my-flag` is `ETCD_MY_FLAG`. It applies to all flags.
|
||||
|
||||
The [official etcd ports][iana-ports] are 2379 for client requests, and 2380 for peer communication. Some legacy code and documentation still references ports 4001 and 7001, but all new etcd use and discussion should adopt the assigned ports.
|
||||
The format of environment variable for flag `-my-flag` is `ETCD_MY_FLAG`. It applies to all flags.
|
||||
|
||||
To start etcd automatically using custom settings at startup in Linux, using a [systemd][systemd-intro] unit is highly recommended.
|
||||
|
||||
[systemd-intro]: http://freedesktop.org/wiki/Software/systemd/
|
||||
|
||||
## Member Flags
|
||||
### Member Flags
|
||||
|
||||
### --name
|
||||
##### -name
|
||||
+ Human-readable name for this member.
|
||||
+ default: "default"
|
||||
+ env variable: ETCD_NAME
|
||||
+ This value is referenced as this node's own entries listed in the `--initial-cluster` flag (Ex: `default=http://localhost:2380` or `default=http://localhost:2380,default=http://localhost:7001`). This needs to match the key used in the flag if you're using [static bootstrapping][build-cluster]. When using discovery, each member must have a unique name. `Hostname` or `machine-id` can be a good choice.
|
||||
+ This value is referenced as this node's own entries listed in the `-initial-cluster` flag (Ex: `default=http://localhost:2380` or `default=http://localhost:2380,default=http://localhost:7001`). This needs to match the key used in the flag if you're using [static boostrapping](clustering.md#static).
|
||||
|
||||
### --data-dir
|
||||
##### -data-dir
|
||||
+ Path to the data directory.
|
||||
+ default: "${name}.etcd"
|
||||
+ env variable: ETCD_DATA_DIR
|
||||
|
||||
### --wal-dir
|
||||
##### -wal-dir
|
||||
+ Path to the dedicated wal directory. If this flag is set, etcd will write the WAL files to the walDir rather than the dataDir. This allows a dedicated disk to be used, and helps avoid io competition between logging and other IO operations.
|
||||
+ default: ""
|
||||
+ env variable: ETCD_WAL_DIR
|
||||
|
||||
### --snapshot-count
|
||||
##### -snapshot-count
|
||||
+ Number of committed transactions to trigger a snapshot to disk.
|
||||
+ default: "10000"
|
||||
+ env variable: ETCD_SNAPSHOT_COUNT
|
||||
|
||||
### --heartbeat-interval
|
||||
##### -heartbeat-interval
|
||||
+ Time (in milliseconds) of a heartbeat interval.
|
||||
+ default: "100"
|
||||
+ env variable: ETCD_HEARTBEAT_INTERVAL
|
||||
|
||||
### --election-timeout
|
||||
##### -election-timeout
|
||||
+ Time (in milliseconds) for an election to timeout. See [Documentation/tuning.md](tuning.md#time-parameters) for details.
|
||||
+ default: "1000"
|
||||
+ env variable: ETCD_ELECTION_TIMEOUT
|
||||
|
||||
### --listen-peer-urls
|
||||
##### -listen-peer-urls
|
||||
+ List of URLs to listen on for peer traffic. This flag tells the etcd to accept incoming requests from its peers on the specified scheme://IP:port combinations. Scheme can be either http or https.If 0.0.0.0 is specified as the IP, etcd listens to the given port on all interfaces. If an IP address is given as well as a port, etcd will listen on the given port and interface. Multiple URLs may be used to specify a number of addresses and ports to listen on. The etcd will respond to requests from any of the listed addresses and ports.
|
||||
+ default: "http://localhost:2380,http://localhost:7001"
|
||||
+ env variable: ETCD_LISTEN_PEER_URLS
|
||||
+ example: "http://10.0.0.1:2380"
|
||||
+ invalid example: "http://example.com:2380" (domain name is invalid for binding)
|
||||
|
||||
### --listen-client-urls
|
||||
##### -listen-client-urls
|
||||
+ List of URLs to listen on for client traffic. This flag tells the etcd to accept incoming requests from the clients on the specified scheme://IP:port combinations. Scheme can be either http or https. If 0.0.0.0 is specified as the IP, etcd listens to the given port on all interfaces. If an IP address is given as well as a port, etcd will listen on the given port and interface. Multiple URLs may be used to specify a number of addresses and ports to listen on. The etcd will respond to requests from any of the listed addresses and ports.
|
||||
+ default: "http://localhost:2379,http://localhost:4001"
|
||||
+ env variable: ETCD_LISTEN_CLIENT_URLS
|
||||
+ example: "http://10.0.0.1:2379"
|
||||
+ invalid example: "http://example.com:2379" (domain name is invalid for binding)
|
||||
|
||||
### --max-snapshots
|
||||
##### -max-snapshots
|
||||
+ Maximum number of snapshot files to retain (0 is unlimited)
|
||||
+ default: 5
|
||||
+ env variable: ETCD_MAX_SNAPSHOTS
|
||||
+ The default for users on Windows is unlimited, and manual purging down to 5 (or your preference for safety) is recommended.
|
||||
|
||||
### --max-wals
|
||||
##### -max-wals
|
||||
+ Maximum number of wal files to retain (0 is unlimited)
|
||||
+ default: 5
|
||||
+ env variable: ETCD_MAX_WALS
|
||||
+ The default for users on Windows is unlimited, and manual purging down to 5 (or your preference for safety) is recommended.
|
||||
|
||||
### --cors
|
||||
##### -cors
|
||||
+ Comma-separated white list of origins for CORS (cross-origin resource sharing).
|
||||
+ default: none
|
||||
+ env variable: ETCD_CORS
|
||||
|
||||
## Clustering Flags
|
||||
### Clustering Flags
|
||||
|
||||
`--initial` prefix flags are used in bootstrapping ([static bootstrap][build-cluster], [discovery-service bootstrap][discovery] or [runtime reconfiguration][reconfig]) a new member, and ignored when restarting an existing member.
|
||||
`-initial` prefix flags are used in bootstrapping ([static bootstrap][build-cluster], [discovery-service bootstrap][discovery] or [runtime reconfiguration][reconfig]) a new member, and ignored when restarting an existing member.
|
||||
|
||||
`--discovery` prefix flags need to be set when using [discovery service][discovery].
|
||||
`-discovery` prefix flags need to be set when using [discovery service][discovery].
|
||||
|
||||
### --initial-advertise-peer-urls
|
||||
##### -initial-advertise-peer-urls
|
||||
|
||||
+ List of this member's peer URLs to advertise to the rest of the cluster. These addresses are used for communicating etcd data around the cluster. At least one must be routable to all cluster members. These URLs can contain domain names.
|
||||
+ default: "http://localhost:2380,http://localhost:7001"
|
||||
+ env variable: ETCD_INITIAL_ADVERTISE_PEER_URLS
|
||||
+ example: "http://example.com:2380, http://10.0.0.1:2380"
|
||||
|
||||
### --initial-cluster
|
||||
##### -initial-cluster
|
||||
+ Initial cluster configuration for bootstrapping.
|
||||
+ default: "default=http://localhost:2380,default=http://localhost:7001"
|
||||
+ env variable: ETCD_INITIAL_CLUSTER
|
||||
+ The key is the value of the `--name` flag for each node provided. The default uses `default` for the key because this is the default for the `--name` flag.
|
||||
+ The key is the value of the `-name` flag for each node provided. The default uses `default` for the key because this is the default for the `-name` flag.
|
||||
|
||||
### --initial-cluster-state
|
||||
##### -initial-cluster-state
|
||||
+ Initial cluster state ("new" or "existing"). Set to `new` for all members present during initial static or DNS bootstrapping. If this option is set to `existing`, etcd will attempt to join the existing cluster. If the wrong value is set, etcd will attempt to start but fail safely.
|
||||
+ default: "new"
|
||||
+ env variable: ETCD_INITIAL_CLUSTER_STATE
|
||||
|
||||
[static bootstrap]: clustering.md#static
|
||||
|
||||
### --initial-cluster-token
|
||||
##### -initial-cluster-token
|
||||
+ Initial cluster token for the etcd cluster during bootstrap.
|
||||
+ default: "etcd-cluster"
|
||||
+ env variable: ETCD_INITIAL_CLUSTER_TOKEN
|
||||
|
||||
### --advertise-client-urls
|
||||
##### -advertise-client-urls
|
||||
+ List of this member's client URLs to advertise to the rest of the cluster. These URLs can contain domain names.
|
||||
+ default: "http://localhost:2379,http://localhost:4001"
|
||||
+ env variable: ETCD_ADVERTISE_CLIENT_URLS
|
||||
+ example: "http://example.com:2379, http://10.0.0.1:2379"
|
||||
+ Be careful if you are advertising URLs such as http://localhost:2379 from a cluster member and are using the proxy feature of etcd. This will cause loops, because the proxy will be forwarding requests to itself until its resources (memory, file descriptors) are eventually depleted.
|
||||
|
||||
### --discovery
|
||||
##### -discovery
|
||||
+ Discovery URL used to bootstrap the cluster.
|
||||
+ default: none
|
||||
+ env variable: ETCD_DISCOVERY
|
||||
|
||||
### --discovery-srv
|
||||
##### -discovery-srv
|
||||
+ DNS srv domain used to bootstrap the cluster.
|
||||
+ default: none
|
||||
+ env variable: ETCD_DISCOVERY_SRV
|
||||
|
||||
### --discovery-fallback
|
||||
##### -discovery-fallback
|
||||
+ Expected behavior ("exit" or "proxy") when discovery services fails.
|
||||
+ default: "proxy"
|
||||
+ env variable: ETCD_DISCOVERY_FALLBACK
|
||||
|
||||
### --discovery-proxy
|
||||
##### -discovery-proxy
|
||||
+ HTTP proxy to use for traffic to discovery service.
|
||||
+ default: none
|
||||
+ env variable: ETCD_DISCOVERY_PROXY
|
||||
|
||||
### --strict-reconfig-check
|
||||
+ Reject reconfiguration requests that would cause quorum loss.
|
||||
+ default: false
|
||||
+ env variable: ETCD_STRICT_RECONFIG_CHECK
|
||||
### Proxy Flags
|
||||
|
||||
## Proxy Flags
|
||||
`-proxy` prefix flags configures etcd to run in [proxy mode][proxy].
|
||||
|
||||
`--proxy` prefix flags configures etcd to run in [proxy mode][proxy].
|
||||
|
||||
### --proxy
|
||||
##### -proxy
|
||||
+ Proxy mode setting ("off", "readonly" or "on").
|
||||
+ default: "off"
|
||||
+ env variable: ETCD_PROXY
|
||||
|
||||
### --proxy-failure-wait
|
||||
##### -proxy-failure-wait
|
||||
+ Time (in milliseconds) an endpoint will be held in a failed state before being reconsidered for proxied requests.
|
||||
+ default: 5000
|
||||
+ env variable: ETCD_PROXY_FAILURE_WAIT
|
||||
|
||||
### --proxy-refresh-interval
|
||||
##### -proxy-refresh-interval
|
||||
+ Time (in milliseconds) of the endpoints refresh interval.
|
||||
+ default: 30000
|
||||
+ env variable: ETCD_PROXY_REFRESH_INTERVAL
|
||||
|
||||
### --proxy-dial-timeout
|
||||
##### -proxy-dial-timeout
|
||||
+ Time (in milliseconds) for a dial to timeout or 0 to disable the timeout
|
||||
+ default: 1000
|
||||
+ env variable: ETCD_PROXY_DIAL_TIMEOUT
|
||||
|
||||
### --proxy-write-timeout
|
||||
##### -proxy-write-timeout
|
||||
+ Time (in milliseconds) for a write to timeout or 0 to disable the timeout.
|
||||
+ default: 5000
|
||||
+ env variable: ETCD_PROXY_WRITE_TIMEOUT
|
||||
|
||||
### --proxy-read-timeout
|
||||
##### -proxy-read-timeout
|
||||
+ Time (in milliseconds) for a read to timeout or 0 to disable the timeout.
|
||||
+ Don't change this value if you use watches because they are using long polling requests.
|
||||
+ default: 0
|
||||
+ env variable: ETCD_PROXY_READ_TIMEOUT
|
||||
|
||||
## Security Flags
|
||||
### Security Flags
|
||||
|
||||
The security flags help to [build a secure etcd cluster][security].
|
||||
|
||||
### --ca-file [DEPRECATED]
|
||||
+ Path to the client server TLS CA file. `--ca-file ca.crt` could be replaced by `--trusted-ca-file ca.crt --client-cert-auth` and etcd will perform the same.
|
||||
##### -ca-file [DEPRECATED]
|
||||
+ Path to the client server TLS CA file. `-ca-file ca.crt` could be replaced by `-trusted-ca-file ca.crt -client-cert-auth` and etcd will perform the same.
|
||||
+ default: none
|
||||
+ env variable: ETCD_CA_FILE
|
||||
|
||||
### --cert-file
|
||||
##### -cert-file
|
||||
+ Path to the client server TLS cert file.
|
||||
+ default: none
|
||||
+ env variable: ETCD_CERT_FILE
|
||||
|
||||
### --key-file
|
||||
##### -key-file
|
||||
+ Path to the client server TLS key file.
|
||||
+ default: none
|
||||
+ env variable: ETCD_KEY_FILE
|
||||
|
||||
### --client-cert-auth
|
||||
##### -client-cert-auth
|
||||
+ Enable client cert authentication.
|
||||
+ default: false
|
||||
+ env variable: ETCD_CLIENT_CERT_AUTH
|
||||
|
||||
### --trusted-ca-file
|
||||
##### -trusted-ca-file
|
||||
+ Path to the client server TLS trusted CA key file.
|
||||
+ default: none
|
||||
+ env variable: ETCD_TRUSTED_CA_FILE
|
||||
|
||||
### --peer-ca-file [DEPRECATED]
|
||||
+ Path to the peer server TLS CA file. `--peer-ca-file ca.crt` could be replaced by `--peer-trusted-ca-file ca.crt --peer-client-cert-auth` and etcd will perform the same.
|
||||
##### -peer-ca-file [DEPRECATED]
|
||||
+ Path to the peer server TLS CA file. `-peer-ca-file ca.crt` could be replaced by `-peer-trusted-ca-file ca.crt -peer-client-cert-auth` and etcd will perform the same.
|
||||
+ default: none
|
||||
+ env variable: ETCD_PEER_CA_FILE
|
||||
|
||||
### --peer-cert-file
|
||||
##### -peer-cert-file
|
||||
+ Path to the peer server TLS cert file.
|
||||
+ default: none
|
||||
+ env variable: ETCD_PEER_CERT_FILE
|
||||
|
||||
### --peer-key-file
|
||||
##### -peer-key-file
|
||||
+ Path to the peer server TLS key file.
|
||||
+ default: none
|
||||
+ env variable: ETCD_PEER_KEY_FILE
|
||||
|
||||
### --peer-client-cert-auth
|
||||
##### -peer-client-cert-auth
|
||||
+ Enable peer client cert authentication.
|
||||
+ default: false
|
||||
+ env variable: ETCD_PEER_CLIENT_CERT_AUTH
|
||||
|
||||
### --peer-trusted-ca-file
|
||||
##### -peer-trusted-ca-file
|
||||
+ Path to the peer server TLS trusted CA file.
|
||||
+ default: none
|
||||
+ env variable: ETCD_PEER_TRUSTED_CA_FILE
|
||||
|
||||
## Logging Flags
|
||||
### Logging Flags
|
||||
|
||||
### --debug
|
||||
##### -debug
|
||||
+ Drop the default log level to DEBUG for all subpackages.
|
||||
+ default: false (INFO for all packages)
|
||||
+ env variable: ETCD_DEBUG
|
||||
|
||||
### --log-package-levels
|
||||
##### -log-package-levels
|
||||
+ Set individual etcd subpackages to specific log levels. An example being `etcdserver=WARNING,security=DEBUG`
|
||||
+ default: none (INFO for all packages)
|
||||
+ env variable: ETCD_LOG_PACKAGE_LEVELS
|
||||
|
||||
|
||||
## Unsafe Flags
|
||||
### Unsafe Flags
|
||||
|
||||
Please be CAUTIOUS when using unsafe flags because it will break the guarantees given by the consensus protocol.
|
||||
For example, it may panic if other members in the cluster are still alive.
|
||||
Follow the instructions when using these flags.
|
||||
|
||||
### --force-new-cluster
|
||||
+ Force to create a new one-member cluster. It commits configuration changes forcing to remove all existing members in the cluster and add itself. It needs to be set to [restore a backup][restore].
|
||||
##### -force-new-cluster
|
||||
+ Force to create a new one-member cluster. It commits configuration changes in force to remove all existing members in the cluster and add itself. It needs to be set to [restore a backup][restore].
|
||||
+ default: false
|
||||
+ env variable: ETCD_FORCE_NEW_CLUSTER
|
||||
|
||||
## Experimental Flags
|
||||
### Experimental Flags
|
||||
|
||||
### --experimental-v3demo
|
||||
+ Enable experimental [v3 demo API][rfc-v3].
|
||||
##### -experimental-v3demo
|
||||
+ Enable experimental [v3 demo API](rfc/v3api.proto).
|
||||
+ default: false
|
||||
+ env variable: ETCD_EXPERIMENTAL_V3DEMO
|
||||
|
||||
## Miscellaneous Flags
|
||||
### Miscellaneous Flags
|
||||
|
||||
### --version
|
||||
##### -version
|
||||
+ Print the version and exit.
|
||||
+ default: false
|
||||
|
||||
## Profiling flags
|
||||
|
||||
### --enable-pprof
|
||||
+ Enable runtime profiling data via HTTP server. Address is at client URL + "/debug/pprof"
|
||||
+ default: false
|
||||
|
||||
[build-cluster]: clustering.md#static
|
||||
[reconfig]: runtime-configuration.md
|
||||
[discovery]: clustering.md#discovery
|
||||
[iana-ports]: https://www.iana.org/assignments/service-names-port-numbers/service-names-port-numbers.xhtml?search=etcd
|
||||
[proxy]: proxy.md
|
||||
[reconfig]: runtime-configuration.md
|
||||
[restore]: admin_guide.md#restoring-a-backup
|
||||
[rfc-v3]: rfc/v3api.md
|
||||
[security]: security.md
|
||||
[systemd-intro]: http://freedesktop.org/wiki/Software/systemd/
|
||||
[tuning]: tuning.md#time-parameters
|
||||
[restore]: admin_guide.md#restoring-a-backup
|
||||
|
@ -6,11 +6,11 @@ The procedure includes some manual steps for sanity checking but it can probably
|
||||
|
||||
## Prepare Release
|
||||
|
||||
Set desired version as environment variable for following steps. Here is an example to release 2.3.0:
|
||||
Set desired version as environment variable for following steps. Here is an example to release 2.1.3:
|
||||
|
||||
```
|
||||
export VERSION=v2.3.0
|
||||
export PREV_VERSION=v2.2.5
|
||||
export VERSION=v2.1.3
|
||||
export PREV_VERSION=v2.1.2
|
||||
```
|
||||
|
||||
All releases version numbers follow the format of [semantic versioning 2.0.0](http://semver.org/).
|
||||
@ -30,6 +30,7 @@ All releases version numbers follow the format of [semantic versioning 2.0.0](ht
|
||||
|
||||
## Write Release Note
|
||||
|
||||
|
||||
- Write introduction for the new release. For example, what major bug we fix, what new features we introduce or what performance improvement we make.
|
||||
- Write changelog for the last release. ChangeLog should be straightforward and easy to understand for the end-user.
|
||||
- Put `[GH XXXX]` at the head of change line to reference Pull Request that introduces the change. Moreover, add a link on it to jump to the Pull Request.
|
||||
@ -60,14 +61,16 @@ It generates all release binaries and images under directory ./release.
|
||||
|
||||
## Sign Binaries and Images
|
||||
|
||||
etcd project key must be used to sign the generated binaries and images.`$SUBKEYID` is the key ID of etcd project Yubikey. Connect the key and run `gpg2 --card-status` to get the ID.
|
||||
Choose appropriate private key to sign the generated binaries and images.
|
||||
|
||||
The following commands are used for public release sign:
|
||||
|
||||
```
|
||||
cd release
|
||||
for i in etcd-*{.zip,.tar.gz}; do gpg2 --default-key $SUBKEYID --output ${i}.asc --detach-sign ${i}; done
|
||||
for i in etcd-*{.zip,.tar.gz}; do gpg2 --verify ${i}.asc ${i}; done
|
||||
# personal GPG is okay for now
|
||||
for i in etcd-*{.zip,.tar.gz}; do gpg --sign ${i}; done
|
||||
# use `CoreOS ACI Builder <release@coreos.com>` secret key
|
||||
gpg -u 88182190 -a --output etcd-${VERSION}-linux-amd64.aci.asc --detach-sig etcd-${VERSION}-linux-amd64.aci
|
||||
```
|
||||
|
||||
## Publish Release Page in GitHub
|
||||
|
@ -4,7 +4,7 @@ Discovery service protocol helps new etcd member to discover all other members i
|
||||
|
||||
Discovery service protocol is _only_ used in cluster bootstrap phase, and cannot be used for runtime reconfiguration or cluster monitoring.
|
||||
|
||||
The protocol uses a new discovery token to bootstrap one _unique_ etcd cluster. Remember that one discovery token can represent only one etcd cluster. As long as discovery protocol on this token starts, even if it fails halfway, it must not be used to bootstrap another etcd cluster.
|
||||
The protocol uses a new discovery token to bootstrap one _unique_ etcd cluster. Remember that one discovery token can represent only one etcd cluster. As long as discovery protocol on this token starts, even if fails halfway, it must not be used to bootstrap another etcd cluster.
|
||||
|
||||
The rest of this article will walk through the discovery process with examples that correspond to a self-hosted discovery cluster. The public discovery service, discovery.etcd.io, functions the same way, but with a layer of polish to abstract away ugly URLs, generate UUIDs automatically, and provide some protections against excessive requests. At its core, the public discovery service still uses an etcd cluster as the data store as described in this document.
|
||||
|
||||
@ -14,7 +14,7 @@ The idea of discovery protocol is to use an internal etcd cluster to coordinate
|
||||
|
||||
In the following example workflow, we will list each step of protocol in curl format for ease of understanding.
|
||||
|
||||
By convention the etcd discovery protocol uses the key prefix `_etcd/registry`. If `http://example.com` hosts an etcd cluster for discovery service, a full URL to discovery keyspace will be `http://example.com/v2/keys/_etcd/registry`. We will use this as the URL prefix in the example.
|
||||
By convention the etcd discovery protocol uses the key prefix `_etcd/registry`. If `http://example.com` hosts a etcd cluster for discovery service, a full URL to discovery keyspace will be `http://example.com/v2/keys/_etcd/registry`. We will use this as the URL prefix in the example.
|
||||
|
||||
### Creating a New Discovery Token
|
||||
|
||||
@ -32,7 +32,7 @@ You need to specify the expected cluster size for this discovery token. The size
|
||||
curl -X PUT http://example.com/v2/keys/_etcd/registry/${UUID}/_config/size -d value=${cluster_size}
|
||||
```
|
||||
|
||||
Usually the cluster size is 3, 5 or 7. Check [optimal cluster size][cluster-size] for more details.
|
||||
Usually the cluster size is 3, 5 or 7. Check [optimal cluster size](admin_guide.md#optimal-cluster-size) for more details.
|
||||
|
||||
### Bringing up etcd Processes
|
||||
|
||||
@ -64,7 +64,7 @@ In etcd implementation, the member may check the cluster status even before regi
|
||||
### Waiting for All Members
|
||||
|
||||
|
||||
The wait process is described in detail in the [etcd API documentation][api].
|
||||
The wait process is described in details [here](https://github.com/coreos/etcd/blob/master/Documentation/api.md#waiting-for-a-change).
|
||||
|
||||
```
|
||||
curl -X GET http://example.com/v2/keys/_etcd/registry/${UUID}?wait=true&waitIndex=${current_etcd_index}
|
||||
@ -94,7 +94,7 @@ Possible status codes:
|
||||
generated discovery url
|
||||
```
|
||||
|
||||
The generation process in the service follows the steps from [Creating a New Discovery Token][new-discovery-token] to [Specifying the Expected Cluster Size][expected-cluster-size].
|
||||
The generation process in the service follows the step from [Creating a New Discovery Token](#creating-a-new-discovery-token) to [Specifying the Expected Cluster Size](#specifying-the-expected-cluster-size).
|
||||
|
||||
### Check Discovery Status
|
||||
|
||||
@ -107,8 +107,3 @@ You can check the status for this discovery token, including the machines that h
|
||||
### Open-source repository
|
||||
|
||||
The repository is located at https://github.com/coreos/discovery.etcd.io. You could use it to build your own public discovery service.
|
||||
|
||||
[api]: api.md#waiting-for-a-change
|
||||
[cluster-size]: admin_guide.md#optimal-cluster-size
|
||||
[expected-cluster-size]: #specifying-the-expected-cluster-size
|
||||
[new-discovery-token]: #creating-a-new-discovery-token
|
||||
|
@ -12,11 +12,9 @@ export HostIP="192.168.12.50"
|
||||
|
||||
The following `docker run` command will expose the etcd client API over ports 4001 and 2379, and expose the peer port over 2380.
|
||||
|
||||
This will run the latest release version of etcd. You can specify version if needed (e.g. `quay.io/coreos/etcd:v2.2.0`).
|
||||
|
||||
```
|
||||
docker run -d -v /usr/share/ca-certificates/:/etc/ssl/certs -p 4001:4001 -p 2380:2380 -p 2379:2379 \
|
||||
--name etcd quay.io/coreos/etcd \
|
||||
--name etcd quay.io/coreos/etcd:v2.0.8 \
|
||||
-name etcd0 \
|
||||
-advertise-client-urls http://${HostIP}:2379,http://${HostIP}:4001 \
|
||||
-listen-client-urls http://0.0.0.0:2379,http://0.0.0.0:4001 \
|
||||
@ -46,7 +44,7 @@ The main difference being the value used for the `-initial-cluster` flag, which
|
||||
|
||||
```
|
||||
docker run -d -v /usr/share/ca-certificates/:/etc/ssl/certs -p 4001:4001 -p 2380:2380 -p 2379:2379 \
|
||||
--name etcd quay.io/coreos/etcd \
|
||||
--name etcd quay.io/coreos/etcd:v2.0.8 \
|
||||
-name etcd0 \
|
||||
-advertise-client-urls http://192.168.12.50:2379,http://192.168.12.50:4001 \
|
||||
-listen-client-urls http://0.0.0.0:2379,http://0.0.0.0:4001 \
|
||||
@ -61,7 +59,7 @@ docker run -d -v /usr/share/ca-certificates/:/etc/ssl/certs -p 4001:4001 -p 2380
|
||||
|
||||
```
|
||||
docker run -d -v /usr/share/ca-certificates/:/etc/ssl/certs -p 4001:4001 -p 2380:2380 -p 2379:2379 \
|
||||
--name etcd quay.io/coreos/etcd \
|
||||
--name etcd quay.io/coreos/etcd:v2.0.8 \
|
||||
-name etcd1 \
|
||||
-advertise-client-urls http://192.168.12.51:2379,http://192.168.12.51:4001 \
|
||||
-listen-client-urls http://0.0.0.0:2379,http://0.0.0.0:4001 \
|
||||
@ -76,7 +74,7 @@ docker run -d -v /usr/share/ca-certificates/:/etc/ssl/certs -p 4001:4001 -p 2380
|
||||
|
||||
```
|
||||
docker run -d -v /usr/share/ca-certificates/:/etc/ssl/certs -p 4001:4001 -p 2380:2380 -p 2379:2379 \
|
||||
--name etcd quay.io/coreos/etcd \
|
||||
--name etcd quay.io/coreos/etcd:v2.0.8 \
|
||||
-name etcd2 \
|
||||
-advertise-client-urls http://192.168.12.52:2379,http://192.168.12.52:4001 \
|
||||
-listen-client-urls http://0.0.0.0:2379,http://0.0.0.0:4001 \
|
||||
|
@ -1,4 +1,4 @@
|
||||
# Error Code
|
||||
Error Code
|
||||
======
|
||||
|
||||
This document describes the error code used in key space '/v2/keys'. Feel free to import 'github.com/coreos/etcd/error' to use.
|
||||
|
@ -37,7 +37,7 @@ timeout.
|
||||
|
||||
A proxy is a redirection server to the etcd cluster. The proxy handles the
|
||||
redirection of a client to the current configuration of the etcd cluster. A
|
||||
typical use case is to start a proxy on a machine, and on first boot up of the
|
||||
typical usecase is to start a proxy on a machine, and on first boot up of the
|
||||
proxy specify both the `--proxy` flag and the `--initial-cluster` flag.
|
||||
|
||||
From there, any etcdctl client that starts up automatically speaks to the local
|
||||
@ -57,27 +57,24 @@ and their integration with the reconfiguration API.
|
||||
Thus, a member that is down, even infinitely, will never be automatically
|
||||
removed from the etcd cluster member list.
|
||||
|
||||
This makes sense because it's usually an application level / administrative
|
||||
This makes sense because its usually an application level / administrative
|
||||
action to determine whether a reconfiguration should happen based on health.
|
||||
|
||||
For more information, refer to the [runtime reconfiguration design document][runtime-reconf-design].
|
||||
For more information, refer to [Documentation/runtime-reconfiguration.md].
|
||||
|
||||
## 6) how does --endpoint work with etcdctl?
|
||||
## 6) how does --peers work with etcdctl?
|
||||
|
||||
The `--endpoint` flag can specify any number of etcd cluster members in a comma
|
||||
The `--peers` flag can specify any number of etcd cluster members in a comma
|
||||
separated list. This list might be a subset, equal to, or more than the actual
|
||||
etcd cluster member list itself.
|
||||
|
||||
If only one peer is specified via the `--endpoint` flag, the etcdctl discovers the
|
||||
If only one peer is specified via the `--peers` flag, the etcdctl discovers the
|
||||
rest of the cluster via the member list of that one peer, and then it randomly
|
||||
chooses a member to use. Again, the client can use the `quorum=true` flag on
|
||||
reads, which will always fail when using a member in the minority.
|
||||
|
||||
If peers from multiple clusters are specified via the `--endpoint` flag, etcdctl
|
||||
If peers from multiple clusters are specified via the `--peers` flag, etcdctl
|
||||
will randomly choose a peer, and the request will simply get routed to one of
|
||||
the clusters. This is probably not what you want.
|
||||
|
||||
Note: --peers flag is now deprecated and --endpoint should be used instead,
|
||||
as it might confuse users to give etcdctl a peerURL.
|
||||
|
||||
[runtime-reconf-design]: runtime-reconf-design.md
|
||||
|
@ -1,35 +1,35 @@
|
||||
# Glossary
|
||||
## Glossary
|
||||
|
||||
This document defines the various terms used in etcd documentation, command line and source code.
|
||||
|
||||
## Node
|
||||
### Node
|
||||
|
||||
Node is an instance of raft state machine.
|
||||
|
||||
It has a unique identification, and records other nodes' progress internally when it is the leader.
|
||||
|
||||
## Member
|
||||
### Member
|
||||
|
||||
Member is an instance of etcd. It hosts a node, and provides service to clients.
|
||||
|
||||
## Cluster
|
||||
### Cluster
|
||||
|
||||
Cluster consists of several members.
|
||||
|
||||
The node in each member follows raft consensus protocol to replicate logs. Cluster receives proposals from members, commits them and apply to local store.
|
||||
|
||||
## Peer
|
||||
### Peer
|
||||
|
||||
Peer is another member of the same cluster.
|
||||
|
||||
## Proposal
|
||||
### Proposal
|
||||
|
||||
A proposal is a request (for example a write request, a configuration change request) that needs to go through raft protocol.
|
||||
|
||||
## Client
|
||||
### Client
|
||||
|
||||
Client is a caller of the cluster's HTTP API.
|
||||
|
||||
## Machine (deprecated)
|
||||
### Machine (deprecated)
|
||||
|
||||
The alternative of Member in etcd before 2.0
|
||||
|
@ -46,7 +46,7 @@ ExecStart=/usr/bin/etcd
|
||||
|
||||
There are several error cases:
|
||||
|
||||
0) Init has already run and the data directory is already configured
|
||||
0) Init has already ran and the data directory is already configured
|
||||
1) Discovery fails because of network timeout, etc
|
||||
2) Discovery fails because the cluster is already full and etcd needs to fall back to proxy
|
||||
3) Static cluster configuration fails because of conflict, misconfiguration or timeout
|
||||
|
@ -1,24 +1,19 @@
|
||||
# Libraries and Tools
|
||||
## Libraries and Tools
|
||||
|
||||
**Tools**
|
||||
|
||||
- [etcdctl](https://github.com/coreos/etcd/tree/master/etcdctl) - A command line client for etcd
|
||||
- [etcdctl](https://github.com/coreos/etcdctl) - A command line client for etcd
|
||||
- [etcd-backup](https://github.com/fanhattan/etcd-backup) - A powerful command line utility for dumping/restoring etcd - Supports v2
|
||||
- [etcd-dump](https://npmjs.org/package/etcd-dump) - Command line utility for dumping/restoring etcd.
|
||||
- [etcd-fs](https://github.com/xetorthio/etcd-fs) - FUSE filesystem for etcd
|
||||
- [etcddir](https://github.com/rekby/etcddir) - Realtime sync etcd and local directory. Work with windows and linux.
|
||||
- [etcd-browser](https://github.com/henszey/etcd-browser) - A web-based key/value editor for etcd using AngularJS
|
||||
- [etcd-lock](https://github.com/datawisesystems/etcd-lock) - Master election & distributed r/w lock implementation using etcd - Supports v2
|
||||
- [etcd-console](https://github.com/matishsiao/etcd-console) - A web-base key/value editor for etcd using PHP
|
||||
- [etcd-viewer](https://github.com/nikfoundas/etcd-viewer) - An etcd key-value store editor/viewer written in Java
|
||||
- [etcdtool](https://github.com/mickep76/etcdtool) - Export/Import/Edit etcd directory as JSON/YAML/TOML and Validate directory using JSON schema
|
||||
- [etcd-rest](https://github.com/mickep76/etcd-rest) - Create generic REST API in Go using etcd as a backend with validation using JSON schema
|
||||
- [etcdsh](https://github.com/kamilhark/etcdsh) - A command line client with support of command history and tab completion. Supports v2
|
||||
|
||||
**Go libraries**
|
||||
|
||||
- [etcd/client](https://github.com/coreos/etcd/blob/master/client) - the officially maintained Go client
|
||||
- [go-etcd](https://github.com/coreos/go-etcd) - the deprecated official client. May be useful for older (<2.0.0) versions of etcd.
|
||||
- [go-etcd](https://github.com/coreos/go-etcd) - Supports v2
|
||||
|
||||
**Java libraries**
|
||||
|
||||
@ -54,7 +49,6 @@
|
||||
|
||||
**C++ libraries**
|
||||
- [edwardcapriolo/etcdcpp](https://github.com/edwardcapriolo/etcdcpp) - Supports v2
|
||||
- [suryanathan/etcdcpp](https://github.com/suryanathan/etcdcpp) - Supports v2 (with waits)
|
||||
|
||||
**Clojure libraries**
|
||||
|
||||
@ -68,7 +62,6 @@
|
||||
|
||||
**.Net Libraries**
|
||||
|
||||
- [wangjia184/etcdnet](https://github.com/wangjia184/etcdnet) - Supports v2
|
||||
- [drusellers/etcetera](https://github.com/drusellers/etcetera)
|
||||
|
||||
**PHP Libraries**
|
||||
@ -87,6 +80,10 @@
|
||||
|
||||
- [efrecon/etcd-tcl](https://github.com/efrecon/etcd-tcl) - Supports v2, except wait.
|
||||
|
||||
A detailed recap of client functionalities can be found in the [clients compatibility matrix][clients-matrix.md].
|
||||
|
||||
[clients-matrix.md]: https://github.com/coreos/etcd/blob/master/Documentation/clients-matrix.md
|
||||
|
||||
**Chef Integration**
|
||||
|
||||
- [coderanger/etcd-chef](https://github.com/coderanger/etcd-chef)
|
||||
@ -114,7 +111,7 @@
|
||||
- [configdb](https://git.autistici.org/ai/configdb/tree/master) - A REST relational abstraction on top of arbitrary database backends, aimed at storing configs and inventories.
|
||||
- [scrz](https://github.com/scrz/scrz) - Container manager, stores configuration in etcd.
|
||||
- [fleet](https://github.com/coreos/fleet) - Distributed init system
|
||||
- [kubernetes/kubernetes](https://github.com/kubernetes/kubernetes) - Container cluster manager introduced by Google.
|
||||
- [GoogleCloudPlatform/kubernetes](https://github.com/GoogleCloudPlatform/kubernetes) - Container cluster manager.
|
||||
- [mailgun/vulcand](https://github.com/mailgun/vulcand) - HTTP proxy that uses etcd as a configuration backend.
|
||||
- [duedil-ltd/discodns](https://github.com/duedil-ltd/discodns) - Simple DNS nameserver using etcd as a database for names and records.
|
||||
- [skynetservices/skydns](https://github.com/skynetservices/skydns) - RFC compliant DNS server
|
||||
|
@ -1,120 +0,0 @@
|
||||
# Members API
|
||||
|
||||
* [List members](#list-members)
|
||||
* [Add a member](#add-a-member)
|
||||
* [Delete a member](#delete-a-member)
|
||||
* [Change the peer urls of a member](#change-the-peer-urls-of-a-member)
|
||||
|
||||
## List members
|
||||
|
||||
Return an HTTP 200 OK response code and a representation of all members in the etcd cluster.
|
||||
|
||||
### Request
|
||||
|
||||
```
|
||||
GET /v2/members HTTP/1.1
|
||||
```
|
||||
|
||||
### Example
|
||||
|
||||
```sh
|
||||
curl http://10.0.0.10:2379/v2/members
|
||||
```
|
||||
|
||||
```json
|
||||
{
|
||||
"members": [
|
||||
{
|
||||
"id": "272e204152",
|
||||
"name": "infra1",
|
||||
"peerURLs": [
|
||||
"http://10.0.0.10:2380"
|
||||
],
|
||||
"clientURLs": [
|
||||
"http://10.0.0.10:2379"
|
||||
]
|
||||
},
|
||||
{
|
||||
"id": "2225373f43",
|
||||
"name": "infra2",
|
||||
"peerURLs": [
|
||||
"http://10.0.0.11:2380"
|
||||
],
|
||||
"clientURLs": [
|
||||
"http://10.0.0.11:2379"
|
||||
]
|
||||
},
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## Add a member
|
||||
|
||||
Returns an HTTP 201 response code and the representation of added member with a newly generated a memberID when successful. Returns a string describing the failure condition when unsuccessful.
|
||||
|
||||
If the POST body is malformed an HTTP 400 will be returned. If the member exists in the cluster or existed in the cluster at some point in the past an HTTP 409 will be returned. If any of the given peerURLs exists in the cluster an HTTP 409 will be returned. If the cluster fails to process the request within timeout an HTTP 500 will be returned, though the request may be processed later.
|
||||
|
||||
### Request
|
||||
|
||||
```
|
||||
POST /v2/members HTTP/1.1
|
||||
|
||||
{"peerURLs": ["http://10.0.0.10:2380"]}
|
||||
```
|
||||
|
||||
### Example
|
||||
|
||||
```sh
|
||||
curl http://10.0.0.10:2379/v2/members -XPOST \
|
||||
-H "Content-Type: application/json" -d '{"peerURLs":["http://10.0.0.10:2380"]}'
|
||||
```
|
||||
|
||||
```json
|
||||
{
|
||||
"id": "3777296169",
|
||||
"peerURLs": [
|
||||
"http://10.0.0.10:2380"
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## Delete a member
|
||||
|
||||
Remove a member from the cluster. The member ID must be a hex-encoded uint64.
|
||||
Returns 204 with empty content when successful. Returns a string describing the failure condition when unsuccessful.
|
||||
|
||||
If the member does not exist in the cluster an HTTP 500(TODO: fix this) will be returned. If the cluster fails to process the request within timeout an HTTP 500 will be returned, though the request may be processed later.
|
||||
|
||||
### Request
|
||||
|
||||
```
|
||||
DELETE /v2/members/<id> HTTP/1.1
|
||||
```
|
||||
|
||||
### Example
|
||||
|
||||
```sh
|
||||
curl http://10.0.0.10:2379/v2/members/272e204152 -XDELETE
|
||||
```
|
||||
|
||||
## Change the peer urls of a member
|
||||
|
||||
Change the peer urls of a given member. The member ID must be a hex-encoded uint64. Returns 204 with empty content when successful. Returns a string describing the failure condition when unsuccessful.
|
||||
|
||||
If the POST body is malformed an HTTP 400 will be returned. If the member does not exist in the cluster an HTTP 404 will be returned. If any of the given peerURLs exists in the cluster an HTTP 409 will be returned. If the cluster fails to process the request within timeout an HTTP 500 will be returned, though the request may be processed later.
|
||||
|
||||
### Request
|
||||
|
||||
```
|
||||
PUT /v2/members/<id> HTTP/1.1
|
||||
|
||||
{"peerURLs": ["http://10.0.0.10:2380"]}
|
||||
```
|
||||
|
||||
### Example
|
||||
|
||||
```sh
|
||||
curl http://10.0.0.10:2379/v2/members/272e204152 -XPUT \
|
||||
-H "Content-Type: application/json" -d '{"peerURLs":["http://10.0.0.10:2380"]}'
|
||||
```
|
||||
|
@ -1,94 +1,101 @@
|
||||
# Metrics
|
||||
## Metrics
|
||||
|
||||
**NOTE: The metrics feature is considered experimental. We may add/change/remove metrics without warning in future releases.**
|
||||
**NOTE: The metrics feature is considered as an experimental. We might add/change/remove metrics without warning in the future releases.**
|
||||
|
||||
etcd uses [Prometheus][prometheus] for metrics reporting in the server. The metrics can be used for real-time monitoring and debugging.
|
||||
etcd only stores these data in memory. If a member restarts, metrics will reset.
|
||||
etcd uses [Prometheus](http://prometheus.io/) for metrics reporting in the server. The metrics can be used for real-time monitoring and debugging.
|
||||
|
||||
The simplest way to see the available metrics is to cURL the metrics endpoint `/metrics` of etcd. The format is described [here](http://prometheus.io/docs/instrumenting/exposition_formats/).
|
||||
|
||||
Follow the [Prometheus getting started doc][prometheus-getting-started] to spin up a Prometheus server to collect etcd metrics.
|
||||
|
||||
The naming of metrics follows the suggested [best practice of Prometheus][prometheus-naming]. A metric name has an `etcd` prefix as its namespace and a subsystem prefix (for example `wal` and `etcdserver`).
|
||||
You can also follow the doc [here](http://prometheus.io/docs/introduction/getting_started/) to start a Promethus server and monitor etcd metrics.
|
||||
|
||||
The naming of metrics follows the suggested [best practice of Promethus](http://prometheus.io/docs/practices/naming/). A metric name has an `etcd` prefix as its namespace and a subsystem prefix (for example `wal` and `etcdserver`).
|
||||
|
||||
etcd now exposes the following metrics:
|
||||
|
||||
## etcdserver
|
||||
### etcdserver
|
||||
|
||||
| Name | Description | Type |
|
||||
|-----------------------------------------|--------------------------------------------------|-----------|
|
||||
| file_descriptors_used_total | The total number of file descriptors used | Gauge |
|
||||
| proposal_durations_seconds | The latency distributions of committing proposal | Histogram |
|
||||
| pending_proposal_total | The total number of pending proposals | Gauge |
|
||||
| proposal_failed_total | The total number of failed proposals | Counter |
|
||||
| Name | Description | Type |
|
||||
|-----------------------------------------|--------------------------------------------------|---------|
|
||||
| file_descriptors_used_total | The total number of file descriptors used | Gauge |
|
||||
| proposal_durations_milliseconds | The latency distributions of committing proposal | Summary |
|
||||
| pending_proposal_total | The total number of pending proposals | Gauge |
|
||||
| proposal_failed_total | The total number of failed proposals | Counter |
|
||||
|
||||
High file descriptors (`file_descriptors_used_total`) usage (near the file descriptors limitation of the process) indicates a potential out of file descriptors issue. That might cause etcd fails to create new WAL files and panics.
|
||||
|
||||
[Proposal][glossary-proposal] durations (`proposal_durations_seconds`) provides a histogram about the proposal commit latency. Latency can be introduced into this process by network and disk IO.
|
||||
[Proposal](glossary.md#proposal) durations (`proposal_durations_milliseconds`) give you an summary about the proposal commit latency. Latency can be introduced into this process by network and disk IO.
|
||||
|
||||
Pending proposal (`pending_proposal_total`) gives you an idea about how many proposal are in the queue and waiting for commit. An increasing pending number indicates a high client load or an unstable cluster.
|
||||
|
||||
Failed proposals (`proposal_failed_total`) are normally related to two issues: temporary failures related to a leader election or longer duration downtime caused by a loss of quorum in the cluster.
|
||||
|
||||
## wal
|
||||
|
||||
| Name | Description | Type |
|
||||
|------------------------------------|--------------------------------------------------|-----------|
|
||||
| fsync_durations_seconds | The latency distributions of fsync called by wal | Histogram |
|
||||
| last_index_saved | The index of the last entry saved by wal | Gauge |
|
||||
### store
|
||||
|
||||
Abnormally high fsync duration (`fsync_durations_seconds`) indicates disk issues and might cause the cluster to be unstable.
|
||||
These metrics describe the accesses into the data store of etcd members that exist in the cluster. They
|
||||
are useful to count what kind of actions are taken by users. It is also useful to see and whether all etcd members
|
||||
"see" the same set of data mutations, and whether reads and watches (which are local) are equally distributed.
|
||||
|
||||
All these metrics are prefixed with `etcd_store_`.
|
||||
|
||||
## http requests
|
||||
|
||||
These metrics describe the serving of requests (non-watch events) served by etcd members in non-proxy mode: total
|
||||
incoming requests, request failures and processing latency (inc. raft rounds for storage). They are useful for tracking
|
||||
user-generated traffic hitting the etcd cluster .
|
||||
|
||||
All these metrics are prefixed with `etcd_http_`
|
||||
|
||||
| Name | Description | Type |
|
||||
|--------------------------------|-----------------------------------------------------------------------------------------|--------------------|
|
||||
| received_total | Total number of events after parsing and auth. | Counter(method) |
|
||||
| failed_total | Total number of failed events. | Counter(method,error) |
|
||||
| successful_duration_second | Bucketed handling times of the requests, including raft rounds for writes. | Histogram(method) |
|
||||
| Name | Description | Type |
|
||||
|---------------------------|------------------------------------------------------------------------------------------|--------------------|
|
||||
| reads_total | Total number of reads from store, should differ among etcd members (local reads). | Counter(action) |
|
||||
| writes_total | Total number of writes to store, should be same among all etcd members. | Counter(action) |
|
||||
| reads_failed_total | Number of failed reads from store (e.g. key missing) on local reads. | Counter(action) |
|
||||
| writes_failed_total | Number of failed writes to store (e.g. failed compare and swap). | Counter(action) |
|
||||
| expires_total | Total number of expired keys (due to TTL). | Counter |
|
||||
| watch_requests_totals | Total number of incoming watch requests to this etcd member (local watches). | Counter |
|
||||
| watchers | Current count of active watchers on this etcd member. | Gauge |
|
||||
|
||||
Both `reads_total` and `writes_total` count both successful and failed requests. `reads_failed_total` and
|
||||
`writes_failed_total` count failed requests. A lot of failed writes indicate possible contentions on keys (e.g. when
|
||||
doing `compareAndSet`), and read failures indicate that some clients try to access keys that don't exist.
|
||||
|
||||
Example Prometheus queries that may be useful from these metrics (across all etcd members):
|
||||
|
||||
* `sum(rate(etcd_http_failed_total{job="etcd"}[1m]) by (method) / sum(rate(etcd_http_events_received_total{job="etcd"})[1m]) by (method)`
|
||||
|
||||
* `sum(rate(etcd_store_reads_total{job="etcd"}[1m])) by (action)`
|
||||
`max(rate(etcd_store_writes_total{job="etcd"}[1m])) by (action)`
|
||||
|
||||
Shows the fraction of events that failed by HTTP method across all members, across a time window of `1m`.
|
||||
|
||||
* `sum(rate(etcd_http_received_total{job="etcd",method="GET})[1m]) by (method)`
|
||||
`sum(rate(etcd_http_received_total{job="etcd",method~="GET})[1m]) by (method)`
|
||||
Rate of reads and writes by action, across all servers across a time window of `1m`. The reason why `max` is used
|
||||
for writes as opposed to `sum` for reads is because all of etcd nodes in the cluster apply all writes to their stores.
|
||||
Shows the rate of successfull readonly/write queries across all servers, across a time window of `1m`.
|
||||
* `sum(rate(etcd_store_watch_requests_total{job="etcd"}[1m]))`
|
||||
|
||||
Shows the rate of successful readonly/write queries across all servers, across a time window of `1m`.
|
||||
Shows rate of new watch requests per second. Likely driven by how often watched keys change.
|
||||
* `sum(etcd_store_watchers{job="etcd"})`
|
||||
|
||||
* `histogram_quantile(0.9, sum(increase(etcd_http_successful_processing_seconds{job="etcd",method="GET"}[5m]) ) by (le))`
|
||||
`histogram_quantile(0.9, sum(increase(etcd_http_successful_processing_seconds{job="etcd",method!="GET"}[5m]) ) by (le))`
|
||||
|
||||
Show the 0.90-tile latency (in seconds) of read/write (respectively) event handling across all members, with a window of `5m`.
|
||||
|
||||
## snapshot
|
||||
|
||||
| Name | Description | Type |
|
||||
|--------------------------------------------|------------------------------------------------------------|-----------|
|
||||
| snapshot_save_total_durations_seconds | The total latency distributions of save called by snapshot | Histogram |
|
||||
|
||||
Abnormally high snapshot duration (`snapshot_save_total_durations_seconds`) indicates disk issues and might cause the cluster to be unstable.
|
||||
Number of active watchers across all etcd servers.
|
||||
|
||||
|
||||
## rafthttp
|
||||
### wal
|
||||
|
||||
| Name | Description | Type | Labels |
|
||||
|-----------------------------------|--------------------------------------------|--------------|--------------------------------|
|
||||
| message_sent_latency_seconds | The latency distributions of messages sent | HistogramVec | sendingType, msgType, remoteID |
|
||||
| message_sent_failed_total | The total number of failed messages sent | Summary | sendingType, msgType, remoteID |
|
||||
| Name | Description | Type |
|
||||
|------------------------------------|--------------------------------------------------|---------|
|
||||
| fsync_durations_microseconds | The latency distributions of fsync called by wal | Summary |
|
||||
| last_index_saved | The index of the last entry saved by wal | Gauge |
|
||||
|
||||
Abnormally high fsync duration (`fsync_durations_microseconds`) indicates disk issues and might cause the cluster to be unstable.
|
||||
|
||||
### snapshot
|
||||
|
||||
| Name | Description | Type |
|
||||
|--------------------------------------------|------------------------------------------------------------|---------|
|
||||
| snapshot_save_total_durations_microseconds | The total latency distributions of save called by snapshot | Summary |
|
||||
|
||||
Abnormally high snapshot duration (`snapshot_save_total_durations_microseconds`) indicates disk issues and might cause the cluster to be unstable.
|
||||
|
||||
|
||||
Abnormally high message duration (`message_sent_latency_seconds`) indicates network issues and might cause the cluster to be unstable.
|
||||
### rafthttp
|
||||
|
||||
| Name | Description | Type | Labels |
|
||||
|-----------------------------------|--------------------------------------------|---------|--------------------------------|
|
||||
| message_sent_latency_microseconds | The latency distributions of messages sent | Summary | sendingType, msgType, remoteID |
|
||||
| message_sent_failed_total | The total number of failed messages sent | Summary | sendingType, msgType, remoteID |
|
||||
|
||||
|
||||
Abnormally high message duration (`message_sent_latency_microseconds`) indicates network issues and might cause the cluster to be unstable.
|
||||
|
||||
An increase in message failures (`message_sent_failed_total`) indicates more severe network issues and might cause the cluster to be unstable.
|
||||
|
||||
@ -99,7 +106,7 @@ Label `msgType` is the type of raft message. `MsgApp` is log replication message
|
||||
Label `remoteID` is the member ID of the message destination.
|
||||
|
||||
|
||||
## proxy
|
||||
### proxy
|
||||
|
||||
etcd members operating in proxy mode do not do store operations. They forward all requests
|
||||
to cluster instances.
|
||||
@ -123,12 +130,8 @@ Example Prometheus queries that may be useful from these metrics (across all etc
|
||||
* `histogram_quantile(0.9, sum(increase(etcd_proxy_events_handling_time_seconds_bucket{job="etcd",method="GET"}[5m])) by (le))`
|
||||
`histogram_quantile(0.9, sum(increase(etcd_proxy_events_handling_time_seconds_bucket{job="etcd",method!="GET"}[5m])) by (le))`
|
||||
|
||||
Show the 0.90-tile latency (in seconds) of handling of user requests across all proxy machines, with a window of `5m`.
|
||||
Show the 0.90-tile latency (in seconds) of handling of user requestsacross all proxy machines, with a window of `5m`.
|
||||
* `sum(rate(etcd_proxy_dropped_total{job="etcd"}[1m])) by (proxying_error)`
|
||||
|
||||
Number of failed request on the proxy. This should be 0, spikes here indicate connectivity issues to etcd cluster.
|
||||
|
||||
[glossary-proposal]: glossary.md#proposal
|
||||
[prometheus]: http://prometheus.io/
|
||||
[prometheus-getting-started](http://prometheus.io/docs/introduction/getting_started/)
|
||||
[prometheus-naming]: http://prometheus.io/docs/practices/naming/
|
||||
|
@ -1,28 +1,119 @@
|
||||
# Miscellaneous APIs
|
||||
## Members API
|
||||
|
||||
* [Getting the etcd version](#getting-the-etcd-version)
|
||||
* [Checking health of an etcd member node](#checking-health-of-an-etcd-member-node)
|
||||
* [List members](#list-members)
|
||||
* [Add a member](#add-a-member)
|
||||
* [Delete a member](#delete-a-member)
|
||||
* [Change the peer urls of a member](#change-the-peer-urls-of-a-member)
|
||||
|
||||
## Getting the etcd version
|
||||
## List members
|
||||
|
||||
The etcd version of a specific instance can be obtained from the `/version` endpoint.
|
||||
Return an HTTP 200 OK response code and a representation of all members in the etcd cluster.
|
||||
|
||||
### Request
|
||||
|
||||
```
|
||||
GET /v2/members HTTP/1.1
|
||||
```
|
||||
|
||||
### Example
|
||||
|
||||
```sh
|
||||
curl -L http://127.0.0.1:2379/version
|
||||
```
|
||||
|
||||
```
|
||||
etcd 2.0.12
|
||||
```
|
||||
|
||||
## Checking health of an etcd member node
|
||||
|
||||
etcd provides a `/health` endpoint to verify the health of a particular member.
|
||||
|
||||
```sh
|
||||
curl http://10.0.0.10:2379/health
|
||||
curl http://10.0.0.10:2379/v2/members
|
||||
```
|
||||
|
||||
```json
|
||||
{"health": "true"}
|
||||
{
|
||||
"members": [
|
||||
{
|
||||
"id": "272e204152",
|
||||
"name": "infra1",
|
||||
"peerURLs": [
|
||||
"http://10.0.0.10:2380"
|
||||
],
|
||||
"clientURLs": [
|
||||
"http://10.0.0.10:2379"
|
||||
]
|
||||
},
|
||||
{
|
||||
"id": "2225373f43",
|
||||
"name": "infra2",
|
||||
"peerURLs": [
|
||||
"http://10.0.0.11:2380"
|
||||
],
|
||||
"clientURLs": [
|
||||
"http://10.0.0.11:2379"
|
||||
]
|
||||
},
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## Add a member
|
||||
|
||||
Returns an HTTP 201 response code and the representation of added member with a newly generated a memberID when successful. Returns a string describing the failure condition when unsuccessful.
|
||||
|
||||
If the POST body is malformed an HTTP 400 will be returned. If the member exists in the cluster or existed in the cluster at some point in the past an HTTP 409 will be returned. If any of the given peerURLs exists in the cluster an HTTP 409 will be returned. If the cluster fails to process the request within timeout an HTTP 500 will be returned, though the request may be processed later.
|
||||
|
||||
### Request
|
||||
|
||||
```
|
||||
POST /v2/members HTTP/1.1
|
||||
|
||||
{"peerURLs": ["http://10.0.0.10:2380"]}
|
||||
```
|
||||
|
||||
### Example
|
||||
|
||||
```sh
|
||||
curl http://10.0.0.10:2379/v2/members -XPOST \
|
||||
-H "Content-Type: application/json" -d '{"peerURLs":["http://10.0.0.10:2380"]}'
|
||||
```
|
||||
|
||||
```json
|
||||
{
|
||||
"id": "3777296169",
|
||||
"peerURLs": [
|
||||
"http://10.0.0.10:2380"
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## Delete a member
|
||||
|
||||
Remove a member from the cluster. The member ID must be a hex-encoded uint64.
|
||||
Returns 204 with empty content when successful. Returns a string describing the failure condition when unsuccessful.
|
||||
|
||||
If the member does not exist in the cluster an HTTP 500(TODO: fix this) will be returned. If the cluster fails to process the request within timeout an HTTP 500 will be returned, though the request may be processed later.
|
||||
|
||||
### Request
|
||||
|
||||
```
|
||||
DELETE /v2/members/<id> HTTP/1.1
|
||||
```
|
||||
|
||||
### Example
|
||||
|
||||
```sh
|
||||
curl http://10.0.0.10:2379/v2/members/272e204152 -XDELETE
|
||||
```
|
||||
|
||||
## Change the peer urls of a member
|
||||
|
||||
Change the peer urls of a given member. The member ID must be a hex-encoded uint64. Returns 204 with empty content when successful. Returns a string describing the failure condition when unsuccessful.
|
||||
|
||||
If the POST body is malformed an HTTP 400 will be returned. If the member does not exist in the cluster an HTTP 404 will be returned. If any of the given peerURLs exists in the cluster an HTTP 409 will be returned. If the cluster fails to process the request within timeout an HTTP 500 will be returned, though the request may be processed later.
|
||||
|
||||
#### Request
|
||||
|
||||
```
|
||||
PUT /v2/members/<id> HTTP/1.1
|
||||
|
||||
{"peerURLs": ["http://10.0.0.10:2380"]}
|
||||
```
|
||||
|
||||
#### Example
|
||||
|
||||
```sh
|
||||
curl http://10.0.0.10:2379/v2/members/272e204152 -XPUT \
|
||||
-H "Content-Type: application/json" -d '{"peerURLs":["http://10.0.0.10:2380"]}'
|
||||
```
|
||||
|
@ -15,7 +15,7 @@ when asked
|
||||
|
||||
2. Update your repository data with `pkg update`
|
||||
|
||||
3. Install etcd with `pkg install coreos-etcd coreos-etcdctl`
|
||||
3. Install etcd with `pkg install coreosetcd coreosetcdctl`
|
||||
|
||||
4. Verify successful installation with `pkg info | grep etcd` and you should get:
|
||||
|
||||
|
4
Documentation/production-ready.md
Normal file
4
Documentation/production-ready.md
Normal file
@ -0,0 +1,4 @@
|
||||
etcd is being used successfully by many companies in production. It is,
|
||||
however, under active development and systems like etcd are difficult to get
|
||||
correct. If you are comfortable with bleeding-edge software please use etcd and
|
||||
provide us with the feedback and testing young software needs.
|
@ -1,51 +0,0 @@
|
||||
# Production Users
|
||||
|
||||
This document tracks people and use cases for etcd in production. By creating a list of production use cases we hope to build a community of advisors that we can reach out to with experience using various etcd applications, operation environments, and cluster sizes. The etcd development team may reach out periodically to check-in on your experience and update this list.
|
||||
|
||||
## discovery.etcd.io
|
||||
|
||||
- *Application*: https://github.com/coreos/discovery.etcd.io
|
||||
- *Launched*: Feb. 2014
|
||||
- *Cluster Size*: 5 members, 5 discovery proxies
|
||||
- *Order of Data Size*: 100s of Megabytes
|
||||
- *Operator*: CoreOS, brandon.philips@coreos.com
|
||||
- *Environment*: AWS
|
||||
- *Backups*: Periodic async to S3
|
||||
|
||||
discovery.etcd.io is the longest continuously running etcd backed service that we know about. It is the basis of automatic cluster bootstrap and was launched in Feb. 2014: https://coreos.com/blog/etcd-0.3.0-released/.
|
||||
|
||||
## OpenTable
|
||||
|
||||
- *Application*: OpenTable internal service discovery and cluster configuration management
|
||||
- *Launched*: May 2014
|
||||
- *Cluster Size*: 3 members each in 6 independent clusters; approximately 50 nodes reading / writing
|
||||
- *Order of Data Size*: 10s of MB
|
||||
- *Operator*: OpenTable, Inc; sschlansker@opentable.com
|
||||
- *Environment*: AWS, VMWare
|
||||
- *Backups*: None, all data can be re-created if necessary.
|
||||
|
||||
## cycoresys.com
|
||||
|
||||
- *Application*: multiple
|
||||
- *Launched*: Jul. 2014
|
||||
- *Cluster Size*: 3 members, _n_ proxies
|
||||
- *Order of Data Size*: 100s of kilobytes
|
||||
- *Operator*: CyCore Systems, Inc, sys@cycoresys.com
|
||||
- *Environment*: Baremetal
|
||||
- *Backups*: Periodic sync to Ceph RadosGW and DigitalOcean VM
|
||||
|
||||
CyCore Systems provides architecture and engineering for computing systems. This cluster provides microservices, virtual machines, databases, storage clusters to a number of clients. It is built on CoreOS machines, with each machine in the cluster running etcd as a peer or proxy.
|
||||
|
||||
## Radius Intelligence
|
||||
|
||||
- *Application*: multiple internal tools, Kubernetes clusters, bootstrappable system configs
|
||||
- *Launched*: June 2015
|
||||
- *Cluster Size*: 2 clusters of 5 and 3 members; approximately a dozen nodes read/write
|
||||
- *Order of Data Size*: 100s of kilobytes
|
||||
- *Operator*: Radius Intelligence; jcderr@radius.com
|
||||
- *Environment*: AWS, CoreOS, Kubernetes
|
||||
- *Backups*: None, all data can be recreated if necessary.
|
||||
|
||||
Radius Intelligence uses Kubernetes running CoreOS to containerize and scale internal toolsets. Examples include running [JetBrains TeamCity][teamcity] and internal AWS security and cost reporting tools. etcd clusters back these clusters as well as provide some basic environment bootstrapping configuration keys.
|
||||
|
||||
[teamcity]: https://www.jetbrains.com/teamcity/
|
@ -1,153 +1,37 @@
|
||||
# Proxy
|
||||
## Proxy
|
||||
|
||||
etcd can run as a transparent proxy. Doing so allows for easy discovery of etcd within your infrastructure, since it can run on each machine as a local service. In this mode, etcd acts as a reverse proxy and forwards client requests to an active etcd cluster. The etcd proxy does not participate in the consensus replication of the etcd cluster, thus it neither increases the resilience nor decreases the write performance of the etcd cluster.
|
||||
etcd can now run as a transparent proxy. Running etcd as a proxy allows for easily discovery of etcd within your infrastructure, since it can run on each machine as a local service. In this mode, etcd acts as a reverse proxy and forwards client requests to an active etcd cluster. The etcd proxy does not participate in the consensus replication of the etcd cluster, thus it neither increases the resilience nor decreases the write performance of the etcd cluster.
|
||||
|
||||
etcd currently supports two proxy modes: `readwrite` and `readonly`. The default mode is `readwrite`, which forwards both read and write requests to the etcd cluster. A `readonly` etcd proxy only forwards read requests to the etcd cluster, and returns `HTTP 501` to all write requests.
|
||||
etcd currently supports two proxy modes: `readwrite` and `readonly`. The default mode is `readwrite`, which forwards both read and write requests to the etcd cluster. A `readonly` etcd proxy only forwards read requests to the etcd cluster, and returns `HTTP 501` to all write requests.
|
||||
|
||||
The proxy will shuffle the list of cluster members periodically to avoid sending all connections to a single member.
|
||||
|
||||
The member list used by an etcd proxy consists of all client URLs advertised in the cluster. These client URLs are specified in each etcd cluster member's `advertise-client-urls` option.
|
||||
The member list used by proxy consists of all client URLs advertised within the cluster, as specified in each members' `-advertise-client-urls` flag. If this flag is set incorrectly, requests sent to the proxy are forwarded to wrong addresses and then fail. Including URLs in the `-advertise-client-urls` flag that point to the proxy itself, e.g. http://localhost:2379, is even more problematic as it will cause loops, because the proxy keeps trying to forward requests to itself until its resources (memory, file descriptors) are eventually depleted. The fix for this problem is to restart etcd member with correct `-advertise-client-urls` flag. After client URLs list in proxy is recalculated, which happens every 30 seconds, requests will be forwarded correctly.
|
||||
|
||||
An etcd proxy examines several command-line options to discover its peer URLs. In order of precedence, these options are `discovery`, `discovery-srv`, and `initial-cluster`. The `initial-cluster` option is set to a comma-separated list of one or more etcd peer URLs used temporarily in order to discover the permanent cluster.
|
||||
|
||||
After establishing a list of peer URLs in this manner, the proxy retrieves the list of client URLs from the first reachable peer. These client URLs are specified by the `advertise-client-urls` option to etcd peers. The proxy then continues to connect to the first reachable etcd cluster member every thirty seconds to refresh the list of client URLs.
|
||||
|
||||
While etcd proxies therefore do not need to be given the `advertise-client-urls` option, as they retrieve this configuration from the cluster, this implies that `initial-cluster` must be set correctly for every proxy, and the `advertise-client-urls` option must be set correctly for every non-proxy, first-order cluster peer. Otherwise, requests to any etcd proxy would be forwarded improperly. Take special care not to set the `advertise-client-urls` option to URLs that point to the proxy itself, as such a configuration will cause the proxy to enter a loop, forwarding requests to itself until resources are exhausted. To correct either case, stop etcd and restart it with the correct URLs.
|
||||
|
||||
[This example Procfile][procfile] illustrates the difference in the etcd peer and proxy command lines used to configure and start a cluster with one proxy under the [goreman process management utility][goreman].
|
||||
|
||||
To summarize etcd proxy startup and peer discovery:
|
||||
|
||||
1. etcd proxies execute the following steps in order until the cluster *peer-urls* are known:
|
||||
1. If `discovery` is set for the proxy, ask the given discovery service for
|
||||
the *peer-urls*. The *peer-urls* will be the combined
|
||||
`initial-advertise-peer-urls` of all first-order, non-proxy cluster
|
||||
members.
|
||||
2. If `discovery-srv` is set for the proxy, the *peer-urls* are discovered
|
||||
from DNS.
|
||||
3. If `initial-cluster` is set for the proxy, that will become the value of
|
||||
*peer-urls*.
|
||||
4. Otherwise use the default value of
|
||||
`http://localhost:2380,http://localhost:7001`.
|
||||
2. These *peer-urls* are used to contact the (non-proxy) members of the cluster
|
||||
to find their *client-urls*. The *client-urls* will thus be the combined
|
||||
`advertise-client-urls` of all cluster members (i.e. non-proxies).
|
||||
3. Request of clients of the proxy will be forwarded (proxied) to these
|
||||
*client-urls*.
|
||||
|
||||
Always start the first-order etcd cluster members first, then any proxies. A proxy must be able to reach the cluster members to retrieve its configuration, and will attempt connections somewhat aggressively in the absence of such a channel. Starting the members before any proxy ensures the proxy can discover the client URLs when it later starts.
|
||||
|
||||
## Using an etcd proxy
|
||||
To start etcd in proxy mode, you need to provide three flags: `proxy`, `listen-client-urls`, and `initial-cluster` (or `discovery`).
|
||||
### Using an etcd proxy
|
||||
To start etcd in proxy mode, you need to provide three flags: `proxy`, `listen-client-urls`, and `initial-cluster` (or `discovery`).
|
||||
|
||||
To start a readwrite proxy, set `-proxy on`; To start a readonly proxy, set `-proxy readonly`.
|
||||
|
||||
The proxy will be listening on `listen-client-urls` and forward requests to the etcd cluster discovered from in `initial-cluster` or `discovery` url.
|
||||
The proxy will be listening on `listen-client-urls` and forward requests to the etcd cluster discovered from in `initial-cluster` or `discovery` url.
|
||||
|
||||
### Start an etcd proxy with a static configuration
|
||||
#### Start an etcd proxy with a static configuration
|
||||
To start a proxy that will connect to a statically defined etcd cluster, specify the `initial-cluster` flag:
|
||||
|
||||
```
|
||||
etcd --proxy on \
|
||||
--listen-client-urls http://127.0.0.1:8080 \
|
||||
--initial-cluster infra0=http://10.0.1.10:2380,infra1=http://10.0.1.11:2380,infra2=http://10.0.1.12:2380
|
||||
etcd -proxy on -listen-client-urls http://127.0.0.1:8080 -initial-cluster infra0=http://10.0.1.10:2380,infra1=http://10.0.1.11:2380,infra2=http://10.0.1.12:2380
|
||||
```
|
||||
|
||||
### Start an etcd proxy with the discovery service
|
||||
If you bootstrap an etcd cluster using the [discovery service][discovery-service], you can also start the proxy with the same `discovery`.
|
||||
#### Start an etcd proxy with the discovery service
|
||||
If you bootstrap an etcd cluster using the [discovery service][discovery-service], you can also start the proxy with the same `discovery`.
|
||||
|
||||
To start a proxy using the discovery service, specify the `discovery` flag. The proxy will wait until the etcd cluster defined at the `discovery` url finishes bootstrapping, and then start to forward the requests.
|
||||
To start a proxy using the discovery service, specify the `discovery` flag. The proxy will wait until the etcd cluster defined at the `discovery` url finishes bootstrapping, and then start to forward the requests.
|
||||
|
||||
```
|
||||
etcd --proxy on \
|
||||
--listen-client-urls http://127.0.0.1:8080 \
|
||||
--discovery https://discovery.etcd.io/3e86b59982e49066c5d813af1c2e2579cbf573de \
|
||||
etcd -proxy on -listen-client-urls http://127.0.0.1:8080 -discovery https://discovery.etcd.io/3e86b59982e49066c5d813af1c2e2579cbf573de
|
||||
```
|
||||
|
||||
## Fallback to proxy mode with discovery service
|
||||
|
||||
If you bootstrap an etcd cluster using [discovery service][discovery-service] with more than the expected number of etcd members, the extra etcd processes will fall back to being `readwrite` proxies by default. They will forward the requests to the cluster as described above. For example, if you create a discovery url with `size=5`, and start ten etcd processes using that same discovery url, the result will be a cluster with five etcd members and five proxies. Note that this behaviour can be disabled with the `discovery-fallback='exit'` flag.
|
||||
|
||||
## Promote a proxy to a member of etcd cluster
|
||||
|
||||
A Proxy is in the part of etcd cluster that does not participate in consensus. A proxy will not promote itself to an etcd member that participates in consensus automatically in any case.
|
||||
|
||||
If you want to promote a proxy to an etcd member, there are four steps you need to follow:
|
||||
|
||||
- use etcdctl to add the proxy node as an etcd member into the existing cluster
|
||||
- stop the etcd proxy process or service
|
||||
- remove the existing proxy data directory
|
||||
- restart the etcd process with new member configuration
|
||||
|
||||
## Example
|
||||
|
||||
We assume you have a one member etcd cluster with one proxy. The cluster information is listed below:
|
||||
|
||||
|Name|Address|
|
||||
|------|---------|
|
||||
|infra0|10.0.1.10|
|
||||
|proxy0|10.0.1.11|
|
||||
|
||||
This example walks you through a case that you promote one proxy to an etcd member. The cluster will become a two member cluster after finishing the four steps.
|
||||
|
||||
### Add a new member into the existing cluster
|
||||
|
||||
First, use etcdctl to add the member to the cluster, which will output the environment variables need to correctly configure the new member:
|
||||
|
||||
``` bash
|
||||
$ etcdctl -endpoint http://10.0.1.10:2379 member add infra1 http://10.0.1.11:2380
|
||||
added member 9bf1b35fc7761a23 to cluster
|
||||
|
||||
ETCD_NAME="infra1"
|
||||
ETCD_INITIAL_CLUSTER="infra0=http://10.0.1.10:2380,infra1=http://10.0.1.11:2380"
|
||||
ETCD_INITIAL_CLUSTER_STATE=existing
|
||||
```
|
||||
|
||||
### Stop the proxy process
|
||||
|
||||
Stop the existing proxy so we can wipe it's state on disk and reload it with the new configuration:
|
||||
|
||||
``` bash
|
||||
px aux | grep etcd
|
||||
kill %etcd_proxy_pid%
|
||||
```
|
||||
|
||||
or (if you are running etcd proxy as etcd service under systemd)
|
||||
|
||||
``` bash
|
||||
sudo systemctl stop etcd
|
||||
```
|
||||
|
||||
### Remove the existing proxy data dir
|
||||
|
||||
``` bash
|
||||
rm -rf %data_dir%/proxy
|
||||
```
|
||||
|
||||
### Start etcd as a new member
|
||||
|
||||
Finally, start the reconfigured member and make sure it joins the cluster correctly:
|
||||
|
||||
``` bash
|
||||
$ export ETCD_NAME="infra1"
|
||||
$ export ETCD_INITIAL_CLUSTER="infra0=http://10.0.1.10:2380,infra1=http://10.0.1.11:2380"
|
||||
$ export ETCD_INITIAL_CLUSTER_STATE=existing
|
||||
$ etcd --listen-client-urls http://10.0.1.11:2379 \
|
||||
--advertise-client-urls http://10.0.1.11:2379 \
|
||||
--listen-peer-urls http://10.0.1.11:2380 \
|
||||
--initial-advertise-peer-urls http://10.0.1.11:2380 \
|
||||
--data-dir %data_dir%
|
||||
```
|
||||
|
||||
If you are running etcd under systemd, you should modify the service file with correct configuration and restart the service:
|
||||
|
||||
``` bash
|
||||
sudo systemd restart etcd
|
||||
```
|
||||
|
||||
If an error occurs, check the [add member troubleshooting doc][runtime-configuration].
|
||||
#### Fallback to proxy mode with discovery service
|
||||
If you bootstrap a etcd cluster using [discovery service][discovery-service] with more than the expected number of etcd members, the extra etcd processes will fall back to being `readwrite` proxies by default. They will forward the requests to the cluster as described above. For example, if you create a discovery url with `size=5`, and start ten etcd processes using that same discovery url, the result will be a cluster with five etcd members and five proxies. Note that this behaviour can be disabled with the `proxy-fallback` flag.
|
||||
|
||||
[discovery-service]: clustering.md#discovery
|
||||
[goreman]: https://github.com/mattn/goreman
|
||||
[procfile]: /Procfile
|
||||
[runtime-configuration]: runtime-configuration.md#error-cases-when-adding-members
|
||||
|
@ -1,6 +1,6 @@
|
||||
# Reporting Bugs
|
||||
## Reporting Bugs
|
||||
|
||||
If you find bugs or documentation mistakes in the etcd project, please let us know by [opening an issue][issue]. We treat bugs and mistakes very seriously and believe no issue is too small. Before creating a bug report, please check that an issue reporting the same problem does not already exist.
|
||||
If you find bugs or documentation mistakes in etcd project, please let us know by [opening an issue](https://github.com/coreos/etcd/issues/new). We treat bugs and mistakes very seriously and believe no issue is too small. Before creating a bug report, please check there that one does not already exist.
|
||||
|
||||
To make your bug report accurate and easy to understand, please try to create bug reports that are:
|
||||
|
||||
@ -14,13 +14,13 @@ To make your bug report accurate and easy to understand, please try to create bu
|
||||
|
||||
- Scoped. One bug per report. Do not follow up with another bug inside one report.
|
||||
|
||||
You might also want to read [Elika Etemad’s article on filing good bug reports][filing-good-bugs] before creating a bug report.
|
||||
You might also want to read [Elika Etemad’s article on filing good bug reports](http://fantasai.inkedblade.net/style/talks/filing-good-bugs/) before creating a bug report.
|
||||
|
||||
We might ask you for further information to locate a bug. A duplicated bug report will be closed.
|
||||
|
||||
## Frequently Asked Questions
|
||||
|
||||
### How to get a stack trace
|
||||
### How to get stack trace
|
||||
|
||||
``` bash
|
||||
$ kill -QUIT $PID
|
||||
@ -41,5 +41,3 @@ $ sudo journalctl -u etcd2
|
||||
|
||||
Due to an upstream systemd bug, journald may miss the last few log lines when its process exit. If journalctl tells you that etcd stops without fatal or panic message, you could try `sudo journalctl -f -t etcd2` to get full log.
|
||||
|
||||
[etcd-issue]: https://github.com/coreos/etcd/issues/new
|
||||
[filing-good-bugs]: http://fantasai.inkedblade.net/style/talks/filing-good-bugs/
|
||||
|
@ -1,10 +1,4 @@
|
||||
# Overview
|
||||
|
||||
The etcd v3 API is designed to give users a more efficient and cleaner abstraction compared to etcd v2. There are a number of semantic and protocol changes in this new API. For an overview [see Xiang Li's video](https://youtu.be/J5AioGtEPeQ?t=211).
|
||||
|
||||
To prove out the design of the v3 API the team has also built [a number of example recipes](https://github.com/coreos/etcd/tree/master/contrib/recipes), there is a [video discussing these recipes too](https://www.youtube.com/watch?v=fj-2RY-3yVU&feature=youtu.be&t=590).
|
||||
|
||||
# Design
|
||||
## Design
|
||||
|
||||
1. Flatten binary key-value space
|
||||
|
||||
@ -34,24 +28,13 @@ To prove out the design of the v3 API the team has also built [a number of examp
|
||||
- easy for people to write simple etcd application
|
||||
|
||||
|
||||
## Notes
|
||||
|
||||
### Request Size Limitation
|
||||
|
||||
The max request size is around 1MB. Since etcd replicates requests in a streaming fashion, a very large
|
||||
request might block other requests for a long time. The use case for etcd is to store small configuration
|
||||
values, so we prevent user from submitting large requests. This also applies to Txn requests. We might loosen
|
||||
the size in the future a little bit or make it configurable.
|
||||
|
||||
## Protobuf Defined API
|
||||
|
||||
[api protobuf][api-protobuf]
|
||||
[protobuf](./v3api.proto)
|
||||
|
||||
[kv protobuf][kv-protobuf]
|
||||
### Examples
|
||||
|
||||
## Examples
|
||||
|
||||
### Put a key (foo=bar)
|
||||
#### Put a key (foo=bar)
|
||||
```
|
||||
// A put is always successful
|
||||
Put( PutRequest { key = foo, value = bar } )
|
||||
@ -64,7 +47,7 @@ PutResponse {
|
||||
}
|
||||
```
|
||||
|
||||
### Get a key (assume we have foo=bar)
|
||||
#### Get a key (assume we have foo=bar)
|
||||
```
|
||||
Get ( RangeRequest { key = foo } )
|
||||
|
||||
@ -85,7 +68,7 @@ RangeResponse {
|
||||
}
|
||||
```
|
||||
|
||||
### Range over a key space (assume we have foo0=bar0… foo100=bar100)
|
||||
#### Range over a key space (assume we have foo0=bar0… foo100=bar100)
|
||||
```
|
||||
Range ( RangeRequest { key = foo, end_key = foo80, limit = 30 } )
|
||||
|
||||
@ -114,7 +97,7 @@ RangeResponse {
|
||||
}
|
||||
```
|
||||
|
||||
### Finish a txn (assume we have foo0=bar0, foo1=bar1)
|
||||
#### Finish a txn (assume we have foo0=bar0, foo1=bar1)
|
||||
```
|
||||
Txn(TxnRequest {
|
||||
// mod_revision of foo0 is equal to 1, mod_revision of foo1 is greater than 1
|
||||
@ -146,7 +129,7 @@ TxnResponse {
|
||||
}
|
||||
```
|
||||
|
||||
### Watch on a key/range
|
||||
#### Watch on a key/range
|
||||
|
||||
```
|
||||
Watch( WatchRequest{
|
||||
@ -206,6 +189,3 @@ WatchResponse {
|
||||
…
|
||||
|
||||
```
|
||||
|
||||
[api-protobuf]: https://github.com/coreos/etcd/blob/master/etcdserver/etcdserverpb/rpc.proto
|
||||
[kv-protobuf]: https://github.com/coreos/etcd/blob/master/storage/storagepb/kv.proto
|
||||
|
285
Documentation/rfc/v3api.proto
Normal file
285
Documentation/rfc/v3api.proto
Normal file
@ -0,0 +1,285 @@
|
||||
syntax = "proto3";
|
||||
|
||||
// Interface exported by the server.
|
||||
service etcd {
|
||||
// Range gets the keys in the range from the store.
|
||||
rpc Range(RangeRequest) returns (RangeResponse) {}
|
||||
|
||||
// Put puts the given key into the store.
|
||||
// A put request increases the revision of the store,
|
||||
// and generates one event in the event history.
|
||||
rpc Put(PutRequest) returns (PutResponse) {}
|
||||
|
||||
// Delete deletes the given range from the store.
|
||||
// A delete request increase the revision of the store,
|
||||
// and generates one event in the event history.
|
||||
rpc DeleteRange(DeleteRangeRequest) returns (DeleteRangeResponse) {}
|
||||
|
||||
// Txn processes all the requests in one transaction.
|
||||
// A txn request increases the revision of the store,
|
||||
// and generates events with the same revision in the event history.
|
||||
rpc Txn(TxnRequest) returns (TxnResponse) {}
|
||||
|
||||
// Watch watches the events happening or happened in etcd. Both input and output
|
||||
// are stream. One watch rpc can watch for multiple ranges and get a stream of
|
||||
// events. The whole events history can be watched unless compacted.
|
||||
rpc WatchRange(stream WatchRangeRequest) returns (stream WatchRangeResponse) {}
|
||||
|
||||
// Compact compacts the event history in etcd. User should compact the
|
||||
// event history periodically, or it will grow infinitely.
|
||||
rpc Compact(CompactionRequest) returns (CompactionResponse) {}
|
||||
|
||||
// LeaseCreate creates a lease. A lease has a TTL. The lease will expire if the
|
||||
// server does not receive a keepAlive within TTL from the lease holder.
|
||||
// All keys attached to the lease will be expired and deleted if the lease expires.
|
||||
// The key expiration generates an event in event history.
|
||||
rpc LeaseCreate(LeaseCreateRequest) returns (LeaseCreateResponse) {}
|
||||
|
||||
// LeaseRevoke revokes a lease. All the key attached to the lease will be expired and deleted.
|
||||
rpc LeaseRevoke(LeaseRevokeRequest) returns (LeaseRevokeResponse) {}
|
||||
|
||||
// LeaseAttach attaches keys with a lease.
|
||||
rpc LeaseAttach(LeaseAttachRequest) returns (LeaseAttachResponse) {}
|
||||
|
||||
// LeaseTxn likes Txn. It has two addition success and failure LeaseAttachRequest list.
|
||||
// If the Txn is successful, then the success list will be executed. Or the failure list
|
||||
// will be executed.
|
||||
rpc LeaseTxn(LeaseTxnRequest) returns (LeaseTxnResponse) {}
|
||||
|
||||
// KeepAlive keeps the lease alive.
|
||||
rpc LeaseKeepAlive(stream LeaseKeepAliveRequest) returns (stream LeaseKeepAliveResponse) {}
|
||||
}
|
||||
|
||||
message ResponseHeader {
|
||||
// an error type message?
|
||||
string error = 1;
|
||||
uint64 cluster_id = 2;
|
||||
uint64 member_id = 3;
|
||||
// revision of the store when the request was applied.
|
||||
int64 revision = 4;
|
||||
// term of raft when the request was applied.
|
||||
uint64 raft_term = 5;
|
||||
}
|
||||
|
||||
message RangeRequest {
|
||||
// if the range_end is not given, the request returns the key.
|
||||
bytes key = 1;
|
||||
// if the range_end is given, it gets the keys in range [key, range_end).
|
||||
bytes range_end = 2;
|
||||
// limit the number of keys returned.
|
||||
int64 limit = 3;
|
||||
// range over the store at the given revision.
|
||||
// if revision is less or equal to zero, range over the newest store.
|
||||
// if the revision has been compacted, ErrCompaction will be returned in
|
||||
// response.
|
||||
int64 revision = 4;
|
||||
}
|
||||
|
||||
message RangeResponse {
|
||||
ResponseHeader header = 1;
|
||||
repeated storagepb.KeyValue kvs = 2;
|
||||
// more indicates if there are more keys to return in the requested range.
|
||||
bool more = 3;
|
||||
}
|
||||
|
||||
message PutRequest {
|
||||
bytes key = 1;
|
||||
bytes value = 2;
|
||||
}
|
||||
|
||||
message PutResponse {
|
||||
ResponseHeader header = 1;
|
||||
}
|
||||
|
||||
message DeleteRangeRequest {
|
||||
// if the range_end is not given, the request deletes the key.
|
||||
bytes key = 1;
|
||||
// if the range_end is given, it deletes the keys in range [key, range_end).
|
||||
bytes range_end = 2;
|
||||
}
|
||||
|
||||
message DeleteRangeResponse {
|
||||
ResponseHeader header = 1;
|
||||
}
|
||||
|
||||
message RequestUnion {
|
||||
oneof request {
|
||||
RangeRequest request_range = 1;
|
||||
PutRequest request_put = 2;
|
||||
DeleteRangeRequest request_delete_range = 3;
|
||||
}
|
||||
}
|
||||
|
||||
message ResponseUnion {
|
||||
oneof response {
|
||||
RangeResponse response_range = 1;
|
||||
PutResponse response_put = 2;
|
||||
DeleteRangeResponse response_delete_range = 3;
|
||||
}
|
||||
}
|
||||
|
||||
message Compare {
|
||||
enum CompareResult {
|
||||
EQUAL = 0;
|
||||
GREATER = 1;
|
||||
LESS = 2;
|
||||
}
|
||||
enum CompareTarget {
|
||||
VERSION = 0;
|
||||
CREATE = 1;
|
||||
MOD = 2;
|
||||
VALUE= 3;
|
||||
}
|
||||
CompareResult result = 1;
|
||||
CompareTarget target = 2;
|
||||
// key path
|
||||
bytes key = 3;
|
||||
oneof target_union {
|
||||
// version of the given key
|
||||
int64 version = 4;
|
||||
// create revision of the given key
|
||||
int64 create_revision = 5;
|
||||
// last modified revision of the given key
|
||||
int64 mod_revision = 6;
|
||||
// value of the given key
|
||||
bytes value = 7;
|
||||
}
|
||||
}
|
||||
|
||||
// If the comparisons succeed, then the success requests will be processed in order,
|
||||
// and the response will contain their respective responses in order.
|
||||
// If the comparisons fail, then the failure requests will be processed in order,
|
||||
// and the response will contain their respective responses in order.
|
||||
|
||||
// From google paxosdb paper:
|
||||
// Our implementation hinges around a powerful primitive which we call MultiOp. All other database
|
||||
// operations except for iteration are implemented as a single call to MultiOp. A MultiOp is applied atomically
|
||||
// and consists of three components:
|
||||
// 1. A list of tests called guard. Each test in guard checks a single entry in the database. It may check
|
||||
// for the absence or presence of a value, or compare with a given value. Two different tests in the guard
|
||||
// may apply to the same or different entries in the database. All tests in the guard are applied and
|
||||
// MultiOp returns the results. If all tests are true, MultiOp executes t op (see item 2 below), otherwise
|
||||
// it executes f op (see item 3 below).
|
||||
// 2. A list of database operations called t op. Each operation in the list is either an insert, delete, or
|
||||
// lookup operation, and applies to a single database entry. Two different operations in the list may apply
|
||||
// to the same or different entries in the database. These operations are executed
|
||||
// if guard evaluates to
|
||||
// true.
|
||||
// 3. A list of database operations called f op. Like t op, but executed if guard evaluates to false.
|
||||
message TxnRequest {
|
||||
repeated Compare compare = 1;
|
||||
repeated RequestUnion success = 2;
|
||||
repeated RequestUnion failure = 3;
|
||||
}
|
||||
|
||||
message TxnResponse {
|
||||
ResponseHeader header = 1;
|
||||
bool succeeded = 2;
|
||||
repeated ResponseUnion responses = 3;
|
||||
}
|
||||
|
||||
message KeyValue {
|
||||
bytes key = 1;
|
||||
int64 create_revision = 2;
|
||||
// mod_revision is the last modified revision of the key.
|
||||
int64 mod_revision = 3;
|
||||
// version is the version of the key. A deletion resets
|
||||
// the version to zero and any modification of the key
|
||||
// increases its version.
|
||||
int64 version = 4;
|
||||
bytes value = 5;
|
||||
}
|
||||
|
||||
message WatchRangeRequest {
|
||||
// if the range_end is not given, the request returns the key.
|
||||
bytes key = 1;
|
||||
// if the range_end is given, it gets the keys in range [key, range_end).
|
||||
bytes range_end = 2;
|
||||
// start_revision is an optional revision (including) to watch from. No start_revision is "now".
|
||||
int64 start_revision = 3;
|
||||
// end_revision is an optional revision (excluding) to end watch. No end_revision is "forever".
|
||||
int64 end_revision = 4;
|
||||
bool progress_notification = 5;
|
||||
}
|
||||
|
||||
message WatchRangeResponse {
|
||||
ResponseHeader header = 1;
|
||||
repeated Event events = 2;
|
||||
}
|
||||
|
||||
message Event {
|
||||
enum EventType {
|
||||
PUT = 0;
|
||||
DELETE = 1;
|
||||
EXPIRE = 2;
|
||||
}
|
||||
EventType event_type = 1;
|
||||
// a put event contains the current key-value
|
||||
// a delete/expire event contains the previous
|
||||
// key-value
|
||||
KeyValue kv = 2;
|
||||
}
|
||||
|
||||
// Compaction compacts the kv store upto the given revision (including).
|
||||
// It removes the old versions of a key. It keeps the newest version of
|
||||
// the key even if its latest modification revision is smaller than the given
|
||||
// revision.
|
||||
message CompactionRequest {
|
||||
int64 revision = 1;
|
||||
}
|
||||
|
||||
message CompactionResponse {
|
||||
ResponseHeader header = 1;
|
||||
}
|
||||
|
||||
message LeaseCreateRequest {
|
||||
// advisory ttl in seconds
|
||||
int64 ttl = 1;
|
||||
}
|
||||
|
||||
message LeaseCreateResponse {
|
||||
ResponseHeader header = 1;
|
||||
int64 lease_id = 2;
|
||||
// server decided ttl in second
|
||||
int64 ttl = 3;
|
||||
string error = 4;
|
||||
}
|
||||
|
||||
message LeaseRevokeRequest {
|
||||
int64 lease_id = 1;
|
||||
}
|
||||
|
||||
message LeaseRevokeResponse {
|
||||
ResponseHeader header = 1;
|
||||
}
|
||||
|
||||
message LeaseTxnRequest {
|
||||
TxnRequest request = 1;
|
||||
repeated LeaseAttachRequest success = 2;
|
||||
repeated LeaseAttachRequest failure = 3;
|
||||
}
|
||||
|
||||
message LeaseTxnResponse {
|
||||
ResponseHeader header = 1;
|
||||
TxnResponse response = 2;
|
||||
repeated LeaseAttachResponse attach_responses = 3;
|
||||
}
|
||||
|
||||
message LeaseAttachRequest {
|
||||
int64 lease_id = 1;
|
||||
bytes key = 2;
|
||||
}
|
||||
|
||||
message LeaseAttachResponse {
|
||||
ResponseHeader header = 1;
|
||||
}
|
||||
|
||||
message LeaseKeepAliveRequest {
|
||||
int64 lease_id = 1;
|
||||
}
|
||||
|
||||
message LeaseKeepAliveResponse {
|
||||
ResponseHeader header = 1;
|
||||
int64 lease_id = 2;
|
||||
int64 ttl = 3;
|
||||
}
|
@ -1,14 +1,16 @@
|
||||
# Runtime Reconfiguration
|
||||
## Runtime Reconfiguration
|
||||
|
||||
etcd comes with support for incremental runtime reconfiguration, which allows users to update the membership of the cluster at run time.
|
||||
|
||||
Reconfiguration requests can only be processed when the majority of the cluster members are functioning. It is **highly recommended** to always have a cluster size greater than two in production. It is unsafe to remove a member from a two member cluster. The majority of a two member cluster is also two. If there is a failure during the removal process, the cluster might not able to make progress and need to [restart from majority failure][majority failure].
|
||||
Reconfiguration requests can only be processed when the the majority of the cluster members are functioning. It is **highly recommended** to always have a cluster size greater than two in production. It is unsafe to remove a member from a two member cluster. The majority of a two member cluster is also two. If there is a failure during the removal process, the cluster might not able to make progress and need to [restart from majority failure][majority failure].
|
||||
|
||||
To better understand the design behind runtime reconfiguration, we suggest you read [the runtime reconfiguration document][runtime-reconf].
|
||||
To better understand the design behind runtime reconfiguration, we suggest you read [this](runtime-reconf-design.md).
|
||||
|
||||
[majority failure]: #restart-cluster-from-majority-failure
|
||||
|
||||
## Reconfiguration Use Cases
|
||||
|
||||
Let's walk through some common reasons for reconfiguring a cluster. Most of these just involve combinations of adding or removing a member, which are explained below under [Cluster Reconfiguration Operations][cluster-reconf].
|
||||
Let us walk through some common reasons for reconfiguring a cluster. Most of these just involve combinations of adding or removing a member, which are explained below under [Cluster Reconfiguration Operations](#cluster-reconfiguration-operations).
|
||||
|
||||
### Cycle or Upgrade Multiple Machines
|
||||
|
||||
@ -16,23 +18,33 @@ If you need to move multiple members of your cluster due to planned maintenance
|
||||
|
||||
It is safe to remove the leader, however there is a brief period of downtime while the election process takes place. If your cluster holds more than 50MB, it is recommended to [migrate the member's data directory][member migration].
|
||||
|
||||
[member migration]: admin_guide.md#member-migration
|
||||
|
||||
### Change the Cluster Size
|
||||
|
||||
Increasing the cluster size can enhance [failure tolerance][fault tolerance table] and provide better read performance. Since clients can read from any member, increasing the number of members increases the overall read throughput.
|
||||
|
||||
Decreasing the cluster size can improve the write performance of a cluster, with a trade-off of decreased resilience. Writes into the cluster are replicated to a majority of members of the cluster before considered committed. Decreasing the cluster size lowers the majority, and each write is committed more quickly.
|
||||
|
||||
[fault tolerance table]: admin_guide.md#fault-tolerance-table
|
||||
|
||||
### Replace A Failed Machine
|
||||
|
||||
If a machine fails due to hardware failure, data directory corruption, or some other fatal situation, it should be replaced as soon as possible. Machines that have failed but haven't been removed adversely affect your quorum and reduce the tolerance for an additional failure.
|
||||
|
||||
To replace the machine, follow the instructions for [removing the member][remove member] from the cluster, and then [add a new member][add member] in its place. If your cluster holds more than 50MB, it is recommended to [migrate the failed member's data directory][member migration] if you can still access it.
|
||||
|
||||
[remove member]: #remove-a-member
|
||||
[add member]: #add-a-new-member
|
||||
|
||||
### Restart Cluster from Majority Failure
|
||||
|
||||
If the majority of your cluster is lost or all of your nodes have changed IP addresses, then you need to take manual action in order to recover safely.
|
||||
The basic steps in the recovery process include [creating a new cluster using the old data][disaster recovery], forcing a single member to act as the leader, and finally using runtime configuration to [add new members][add member] to this new cluster one at a time.
|
||||
|
||||
[add member]: #add-a-new-member
|
||||
[disaster recovery]: admin_guide.md#disaster-recovery
|
||||
|
||||
## Cluster Reconfiguration Operations
|
||||
|
||||
Now that we have the use cases in mind, let us lay out the operations involved in each.
|
||||
@ -48,24 +60,11 @@ All changes to the cluster are done one at a time:
|
||||
* To decrease from 5 to 3 you will make two remove operations
|
||||
|
||||
All of these examples will use the `etcdctl` command line tool that ships with etcd.
|
||||
If you want to use the members API directly you can find the documentation [here][member-api].
|
||||
If you want to use the member API directly you can find the documentation [here](other_apis.md).
|
||||
|
||||
### Update a Member
|
||||
|
||||
#### Update advertise client URLs
|
||||
|
||||
If you would like to update the advertise client URLs of a member, you can simply restart
|
||||
that member with updated client urls flag (`--advertise-client-urls`) or environment variable
|
||||
(`ETCD_ADVERTISE_CLIENT_URLS`). The restarted member will self publish the updated URLs.
|
||||
A wrongly updated client URL will not affect the health of the etcd cluster.
|
||||
|
||||
#### Update advertise peer URLs
|
||||
|
||||
If you would like to update the advertise peer URLs of a member, you have to first update
|
||||
it explicitly via member command and then restart the member. The additional action is required
|
||||
since updating peer URLs changes the cluster wide configuration and can affect the health of the etcd cluster.
|
||||
|
||||
To update the peer URLs, first, we need to find the target member's ID. You can list all members with `etcdctl`:
|
||||
If you would like to update a member IP address (peerURLs), first, we need to find the target member's ID. You can list all members with `etcdctl`:
|
||||
|
||||
```sh
|
||||
$ etcdctl member list
|
||||
@ -103,10 +102,10 @@ It is safe to remove the leader, however the cluster will be inactive while a ne
|
||||
|
||||
Adding a member is a two step process:
|
||||
|
||||
* Add the new member to the cluster via the [members API][member-api] or the `etcdctl member add` command.
|
||||
* Add the new member to the cluster via the [members API](other_apis.md#post-v2members) or the `etcdctl member add` command.
|
||||
* Start the new member with the new cluster configuration, including a list of the updated members (existing members + the new member).
|
||||
|
||||
Using `etcdctl` let's add the new member to the cluster by specifying its [name][conf-name] and [advertised peer URLs][conf-adv-peer]:
|
||||
Using `etcdctl` let's add the new member to the cluster by specifying its [name](configuration.md#-name) and [advertised peer URLs](configuration.md#-initial-advertise-peer-urls):
|
||||
|
||||
```sh
|
||||
$ etcdctl member add infra3 http://10.0.1.13:2380
|
||||
@ -132,7 +131,7 @@ The new member will run as a part of the cluster and immediately begin catching
|
||||
If you are adding multiple members the best practice is to configure a single member at a time and verify it starts correctly before adding more new members.
|
||||
If you add a new member to a 1-node cluster, the cluster cannot make progress before the new member starts because it needs two members as majority to agree on the consensus. You will only see this behavior between the time `etcdctl member add` informs the cluster about the new member and the new member successfully establishing a connection to the existing one.
|
||||
|
||||
#### Error Cases When Adding Members
|
||||
#### Error Cases
|
||||
|
||||
In the following case we have not included our new host in the list of enumerated nodes.
|
||||
If this is a new cluster, the node must be added to the list of initial cluster members.
|
||||
@ -155,30 +154,10 @@ etcdserver: assign ids error: unmatched member while checking PeerURLs
|
||||
exit 1
|
||||
```
|
||||
|
||||
When we start etcd using the data directory of a removed member, etcd will exit automatically if it connects to any active member in the cluster:
|
||||
When we start etcd using the data directory of a removed member, etcd will exit automatically if it connects to any alive member in the cluster:
|
||||
|
||||
```sh
|
||||
$ etcd
|
||||
etcd: this member has been permanently removed from the cluster. Exiting.
|
||||
exit 1
|
||||
```
|
||||
|
||||
### Strict Reconfiguration Check Mode (`-strict-reconfig-check`)
|
||||
|
||||
As described in the above, the best practice of adding new members is to configure a single member at a time and verify it starts correctly before adding more new members. This step by step approach is very important because if newly added members is not configured correctly (for example the peer URLs are incorrect), the cluster can lose quorum. The quorum loss happens since the newly added member are counted in the quorum even if that member is not reachable from other existing members. Also quorum loss might happen if there is a connectivity issue or there are operational issues.
|
||||
|
||||
For avoiding this problem, etcd provides an option `-strict-reconfig-check`. If this option is passed to etcd, etcd rejects reconfiguration requests if the number of started members will be less than a quorum of the reconfigured cluster.
|
||||
|
||||
It is recommended to enable this option. However, it is disabled by default because of keeping compatibility.
|
||||
|
||||
[add member]: #add-a-new-member
|
||||
[cluster-reconf]: #cluster-reconfiguration-operations
|
||||
[conf-adv-peer]: configuration.md#-initial-advertise-peer-urls
|
||||
[conf-name]: configuration.md#-name
|
||||
[disaster recovery]: admin_guide.md#disaster-recovery
|
||||
[fault tolerance table]: admin_guide.md#fault-tolerance-table
|
||||
[majority failure]: #restart-cluster-from-majority-failure
|
||||
[member-api]: members_api.md
|
||||
[member migration]: admin_guide.md#member-migration
|
||||
[remove member]: #remove-a-member
|
||||
[runtime-reconf]: runtime-reconf-design.md
|
||||
|
@ -1,12 +1,12 @@
|
||||
# Design of Runtime Reconfiguration
|
||||
### Design of Runtime Reconfiguration
|
||||
|
||||
Runtime reconfiguration is one of the hardest and most error prone features in a distributed system, especially in a consensus based system like etcd.
|
||||
|
||||
Read on to learn about the design of etcd's runtime reconfiguration commands and how we tackled these problems.
|
||||
|
||||
## Two Phase Config Changes Keep you Safe
|
||||
### Two Phase Config Changes Keep you Safe
|
||||
|
||||
In etcd, every runtime reconfiguration has to go through [two phases][add-member] for safety reasons. For example, to add a member you need to first inform cluster of new configuration and then start the new member.
|
||||
In etcd, every runtime reconfiguration has to go through [two phases](Documentation/runtime-configuration.md#add-a-new-member) for safety reasons. For example, to add a member you need to first inform cluster of new configuration and then start the new member.
|
||||
|
||||
Phase 1 - Inform cluster of new configuration
|
||||
|
||||
@ -22,15 +22,15 @@ Without the explicit workflow around cluster membership etcd would be vulnerable
|
||||
|
||||
We think runtime reconfiguration should be a low frequent operation. We made the decision to keep it explicit and user-driven to ensure configuration safety and keep your cluster always running smoothly under your control.
|
||||
|
||||
## Permanent Loss of Quorum Requires New Cluster
|
||||
### Permanent Loss of Quorum Requires New Cluster
|
||||
|
||||
If a cluster permanently loses a majority of its members, a new cluster will need to be started from an old data directory to recover the previous state.
|
||||
|
||||
It is entirely possible to force removing the failed members from the existing cluster to recover. However, we decided not to support this method since it bypasses the normal consensus committing phase, which is unsafe. If the member to remove is not actually dead or you force to remove different members through different members in the same cluster, you will end up with diverged cluster with same clusterID. This is very dangerous and hard to debug/fix afterwards.
|
||||
|
||||
If you have a correct deployment, the possibility of permanent majority lose is very low. But it is a severe enough problem that worth special care. We strongly suggest you to read the [disaster recovery documentation][disaster-recovery] and prepare for permanent majority lose before you put etcd into production.
|
||||
If you have a correct deployment, the possibility of permanent majority lose is very low. But it is a severe enough problem that worth special care. We strongly suggest you to read the [disaster recovery documentation](admin_guide.md#disaster-recovery) and prepare for permanent majority lose before you put etcd into production.
|
||||
|
||||
## Do Not Use Public Discovery Service For Runtime Reconfiguration
|
||||
### Do Not Use Public Discovery Service For Runtime Reconfiguration
|
||||
|
||||
The public discovery service should only be used for bootstrapping a cluster. To join member into an existing cluster, you should use runtime reconfiguration API.
|
||||
|
||||
@ -38,13 +38,10 @@ Discovery service is designed for bootstrapping an etcd cluster in the cloud env
|
||||
|
||||
It seems that using public discovery service is a convenient way to do runtime reconfiguration, after all discovery service already has all the cluster configuration information. However relying on public discovery service brings troubles:
|
||||
|
||||
1. it introduces external dependencies for the entire life-cycle of your cluster, not just bootstrap time. If there is a network issue between your cluster and public discovery service, your cluster will suffer from it.
|
||||
1. it introduces a external dependencies for the entire life-cycle of your cluster, not just bootstrap time. If there is a network issue between your cluster and public discover service, your cluster will suffer from it.
|
||||
|
||||
2. public discovery service must reflect correct runtime configuration of your cluster during it life-cycle. It has to provide security mechanism to avoid bad actions, and it is hard.
|
||||
|
||||
3. public discovery service has to keep tens of thousands of cluster configurations. Our public discovery service backend is not ready for that workload.
|
||||
|
||||
If you want to have a discovery service that supports runtime reconfiguration, the best choice is to build your private one.
|
||||
|
||||
[add-member]: runtime-configuration.md#add-a-new-member
|
||||
[disaster-recovery]: admin_guide.md#disaster-recovery
|
||||
If you want to have a discovery service that supports runtime reconfiguration, the best choice is to build your private one.
|
@ -1,10 +1,12 @@
|
||||
# Security Model
|
||||
# security model
|
||||
|
||||
etcd supports SSL/TLS as well as authentication through client certificates, both for clients to server as well as peer (server to server / cluster) communication.
|
||||
|
||||
To get up and running you first need to have a CA certificate and a signed key pair for one member. It is recommended to create and sign a new key pair for every member in a cluster.
|
||||
|
||||
For convenience, the [cfssl] tool provides an easy interface to certificate generation, and we provide an example using the tool [here][tls-setup]. You can also examine this [alternative guide to generating self-signed key pairs][tls-guide].
|
||||
For convenience the [cfssl](https://github.com/cloudflare/cfssl) tool provides an easy interface to certificate generation, and we provide a full example using the tool at [here](../hack/tls-setup). Alternatively this site provides a good reference on how to generate self-signed key pairs:
|
||||
|
||||
http://www.g-loaded.eu/2005/11/10/be-your-own-ca/
|
||||
|
||||
## Basic setup
|
||||
|
||||
@ -95,7 +97,7 @@ $ curl --cacert /path/to/ca.crt --cert /path/to/client.crt --key /path/to/client
|
||||
-L https://127.0.0.1:2379/v2/keys/foo -XPUT -d value=bar -v
|
||||
```
|
||||
|
||||
You should be able to see:
|
||||
You should able to see:
|
||||
|
||||
```
|
||||
...
|
||||
@ -136,21 +138,13 @@ $ etcd -name infra1 -data-dir infra1 \
|
||||
|
||||
# member2
|
||||
$ etcd -name infra2 -data-dir infra2 \
|
||||
-peer-client-cert-auth -peer-trusted-ca-file=/path/to/ca.crt -peer-cert-file=/path/to/member2.crt -peer-key-file=/path/to/member2.key \
|
||||
-peer-client-cert-atuh -peer-trusted-ca-file=/path/to/ca.crt -peer-cert-file=/path/to/member2.crt -peer-key-file=/path/to/member2.key \
|
||||
-initial-advertise-peer-urls=https://10.0.1.11:2380 -listen-peer-urls=https://10.0.1.11:2380 \
|
||||
-discovery ${DISCOVERY_URL}
|
||||
```
|
||||
|
||||
The etcd members will form a cluster and all communication between members in the cluster will be encrypted and authenticated using the client certificates. You will see in the output of etcd that the addresses it connects to use HTTPS.
|
||||
|
||||
## Notes For etcd Proxy
|
||||
|
||||
etcd proxy terminates the TLS from its client if the connection is secure, and uses proxy's own key/cert specified in `--peer-key-file` and `--peer-cert-file` to communicate with etcd members.
|
||||
|
||||
The proxy communicates with etcd members through both the `--advertise-client-urls` and `--advertise-peer-urls` of a given member. It forwards client requests to etcd members’ advertised client urls, and it syncs the initial cluster configuration through etcd members’ advertised peer urls.
|
||||
|
||||
When client authentication is enabled for an etcd member, the administrator must ensure that the peer certificate specified in the proxy's `--peer-cert-file` option is valid for that authentication. The proxy's peer certificate must also be valid for peer authentication if peer authentication is enabled.
|
||||
|
||||
## Frequently Asked Questions
|
||||
|
||||
### My cluster is not working with peer tls configuration?
|
||||
@ -158,7 +152,7 @@ When client authentication is enabled for an etcd member, the administrator must
|
||||
The internal protocol of etcd v2.0.x uses a lot of short-lived HTTP connections.
|
||||
So, when enabling TLS you may need to increase the heartbeat interval and election timeouts to reduce internal cluster connection churn.
|
||||
A reasonable place to start are these values: ` --heartbeat-interval 500 --election-timeout 2500`.
|
||||
These issues are resolved in the etcd v2.1.x series of releases which uses fewer connections.
|
||||
This issues is resolved in the etcd v2.1.x series of releases which uses fewer connections.
|
||||
|
||||
### I'm seeing a SSLv3 alert handshake failure when using SSL client authentication?
|
||||
|
||||
@ -185,9 +179,4 @@ $ openssl ca -config openssl.cnf -policy policy_anything -extensions ssl_client
|
||||
### With peer certificate authentication I receive "certificate is valid for 127.0.0.1, not $MY_IP"
|
||||
Make sure that you sign your certificates with a Subject Name your member's public IP address. The `etcd-ca` tool for example provides an `--ip=` option for its `new-cert` command.
|
||||
|
||||
If you need your certificate to be signed for your member's FQDN in its Subject Name then you could use Subject Alternative Names (short IP SANs) to add your IP address. The `etcd-ca` tool provides `--domain=` option for its `new-cert` command, and openssl can make [it][alt-name] too.
|
||||
|
||||
[cfssl]: https://github.com/cloudflare/cfssl
|
||||
[tls-setup]: /hack/tls-setup
|
||||
[tls-guide]: https://github.com/coreos/docs/blob/master/os/generate-self-signed-certificates.md
|
||||
[alt-name]: http://wiki.cacert.org/FAQ/subjectAltName
|
||||
If you need your certificate to be signed for your member's FQDN in its Subject Name then you could use Subject Alternative Names (short IP SANs) to add your IP address. The `etcd-ca` tool provides `--domain=` option for its `new-cert` command, and openssl can make [it](http://wiki.cacert.org/FAQ/subjectAltName) too.
|
||||
|
@ -1,16 +1,16 @@
|
||||
# Tuning
|
||||
## Tuning
|
||||
|
||||
The default settings in etcd should work well for installations on a local network where the average network latency is low.
|
||||
However, when using etcd across multiple data centers or over networks with high latency you may need to tweak the heartbeat interval and election timeout settings.
|
||||
|
||||
The network isn't the only source of latency. Each request and response may be impacted by slow disks on both the leader and follower. Each of these timeouts represents the total time from request to successful response from the other machine.
|
||||
|
||||
## Time Parameters
|
||||
### Time Parameters
|
||||
|
||||
The underlying distributed consensus protocol relies on two separate time parameters to ensure that nodes can handoff leadership if one stalls or goes offline.
|
||||
The first parameter is called the *Heartbeat Interval*.
|
||||
This is the frequency with which the leader will notify followers that it is still the leader.
|
||||
For best practices, the parameter should be set around round-trip time between members.
|
||||
For best pratices, the parameter should be set around round-trip time between members.
|
||||
By default, etcd uses a `100ms` heartbeat interval.
|
||||
|
||||
The second parameter is the *Election Timeout*.
|
||||
@ -21,20 +21,17 @@ Adjusting these values is a trade off.
|
||||
The value of heartbeat interval is recommended to be around the maximum of average round-trip time (RTT) between members, normally around 0.5-1.5x the round-trip time.
|
||||
If heartbeat interval is too low, etcd will send unnecessary messages that increase the usage of CPU and network resources.
|
||||
On the other side, a too high heartbeat interval leads to high election timeout. Higher election timeout takes longer time to detect a leader failure.
|
||||
The easiest way to measure round-trip time (RTT) is to use [PING utility][ping].
|
||||
The easiest way to measure round-trip time (RTT) is to use [PING utility](https://en.wikipedia.org/wiki/Ping_(networking_utility)).
|
||||
|
||||
The election timeout should be set based on the heartbeat interval and average round-trip time between members.
|
||||
Election timeouts must be at least 10 times the round-trip time so it can account for variance in your network.
|
||||
For example, if the round-trip time between your members is 10ms then you should have at least a 100ms election timeout.
|
||||
|
||||
The upper limit of election timeout is 50000ms, which should only be used when deploying global etcd cluster. First, 5s is the upper limit of average global round-trip time. A reasonable round-trip time for the continental united states is 130ms, and the time between US and japan is around 350-400ms. Because package gets delayed a lot, and network situation may be terrible, 5s is a safe value for it. Then, because election timeout should be an order of magnitude bigger than broadcast time, 50s becomes its maximum.
|
||||
|
||||
You should also set your election timeout to at least 5 to 10 times your heartbeat interval to account for variance in leader replication.
|
||||
For a heartbeat interval of 50ms you should set your election timeout to at least 250ms - 500ms.
|
||||
|
||||
The upper limit of election timeout is 50000ms (50s), which should only be used when deploying a globally-distributed etcd cluster.
|
||||
A reasonable round-trip time for the continental United States is 130ms, and the time between US and Japan is around 350-400ms.
|
||||
If your network has uneven performance or regular packet delays/loss then it is possible that a couple of retries may be necessary to successfully send a packet. So 5s is a safe upper limit of global round-trip time.
|
||||
As the election timeout should be an order of magnitude bigger than broadcast time, in the case of ~5s for a globally distributed cluster, then 50 seconds becomes a reasonable maximum.
|
||||
|
||||
The heartbeat interval and election timeout value should be the same for all members in one cluster. Setting different values for etcd members may disrupt cluster stability.
|
||||
|
||||
You can override the default values on the command line:
|
||||
@ -49,7 +46,7 @@ $ ETCD_HEARTBEAT_INTERVAL=100 ETCD_ELECTION_TIMEOUT=500 etcd
|
||||
|
||||
The values are specified in milliseconds.
|
||||
|
||||
## Snapshots
|
||||
### Snapshots
|
||||
|
||||
etcd appends all key changes to a log file.
|
||||
This log grows forever and is a complete linear history of every change made to the keys.
|
||||
@ -71,5 +68,3 @@ $ etcd -snapshot-count=5000
|
||||
# Environment variables:
|
||||
$ ETCD_SNAPSHOT_COUNT=5000 etcd
|
||||
```
|
||||
|
||||
[ping]: https://en.wikipedia.org/wiki/Ping_(networking_utility)
|
||||
|
@ -1,4 +1,4 @@
|
||||
# Upgrade etcd to 2.1
|
||||
## Upgrade etcd to 2.1
|
||||
|
||||
In the general case, upgrading from etcd 2.0 to 2.1 can be a zero-downtime, rolling upgrade:
|
||||
- one by one, stop the etcd v2.0 processes and replace them with etcd v2.1 processes
|
||||
@ -6,29 +6,29 @@ In the general case, upgrading from etcd 2.0 to 2.1 can be a zero-downtime, roll
|
||||
|
||||
Before [starting an upgrade](#upgrade-procedure), read through the rest of this guide to prepare.
|
||||
|
||||
## Upgrade Checklists
|
||||
### Upgrade Checklists
|
||||
|
||||
### Upgrade Requirements
|
||||
#### Upgrade Requirement
|
||||
|
||||
To upgrade an existing etcd deployment to 2.1, you must be running 2.0. If you’re running a version of etcd before 2.0, you must upgrade to [2.0][v2.0] before upgrading to 2.1.
|
||||
To upgrade an existing etcd deployment to 2.1, you must be running 2.0. If you’re running a version of etcd before 2.0, you must upgrade to [2.0](https://github.com/coreos/etcd/releases/tag/v2.0.13) before upgrading to 2.1.
|
||||
|
||||
Also, to ensure a smooth rolling upgrade, your running cluster must be healthy. You can check the health of the cluster by using `etcdctl cluster-health` command.
|
||||
|
||||
### Preparedness
|
||||
#### Preparedness
|
||||
|
||||
Before upgrading etcd, always test the services relying on etcd in a staging environment before deploying the upgrade to the production environment.
|
||||
|
||||
You might also want to [backup your data directory][backup-datastore] for a potential [downgrade](#downgrade).
|
||||
You might also want to [backup your data directory](admin_guide.md#backing-up-the-datastore) for a potential [downgrade](#downgrade).
|
||||
|
||||
etcd 2.1 introduces a new [authentication][auth] feature, which is disabled by default. If your deployment depends on these, you may want to test the auth features before enabling them in production.
|
||||
etcd 2.1 introduces a new [authentication](auth_api.md) feature, which is disabled by default. If your deployment depends on these, you may want to test the auth features before enabling them in production.
|
||||
|
||||
### Mixed Versions
|
||||
#### Mixed Versions
|
||||
|
||||
While upgrading, an etcd cluster supports mixed versions of etcd members. The cluster is only considered upgraded once all its members are upgraded to 2.1.
|
||||
|
||||
Internally, etcd members negotiate with each other to determine the overall etcd cluster version, which controls the reported cluster version and the supported features. For example, if you are mid-upgrade, any 2.1 features (such as the the authentication feature mentioned above) won’t be available.
|
||||
|
||||
### Limitations
|
||||
#### Limitations
|
||||
|
||||
If you encounter any issues during the upgrade, you can attempt to restart the etcd process in trouble using a newer v2.1 binary to solve the problem. One known issue is that etcd v2.0.0 and v2.0.2 may panic during rolling upgrades due to an existing bug, which has been fixed since etcd v2.0.3.
|
||||
|
||||
@ -36,11 +36,11 @@ It might take up to 2 minutes for the newly upgraded member to catch up with the
|
||||
|
||||
If you have even more data, this might take more time. If you have a data size larger than 100MB you should contact us before upgrading, so we can make sure the upgrades work smoothly.
|
||||
|
||||
### Downgrade
|
||||
#### Downgrade
|
||||
|
||||
If all members have been upgraded to v2.1, the cluster will be upgraded to v2.1, and downgrade is **not possible**. If any member is still v2.0, the cluster will remain in v2.0, and you can go back to use v2.0 binary.
|
||||
|
||||
Please [backup your data directory][backup-datastore] of all etcd members if you want to downgrade the cluster, even if it is upgraded.
|
||||
Please [backup your data directory](admin_guide.md#backing-up-the-datastore) of all etcd members if you want to downgrade the cluster, even if it is upgraded.
|
||||
|
||||
### Upgrade Procedure
|
||||
|
||||
@ -70,7 +70,7 @@ You will see similar error logging from other etcd processes in your cluster. Th
|
||||
2015/06/23 15:45:11 stream: stopping the stream server...
|
||||
```
|
||||
|
||||
You could [backup your data directory][backup-datastore] for data safety.
|
||||
You could [backup your data directory](https://github.com/coreos/etcd/blob/7f7e2cc79d9c5c342a6eb1e48c386b0223cf934e/Documentation/admin_guide.md#backing-up-the-datastore) for data safety.
|
||||
|
||||
```
|
||||
$ etcdctl backup \
|
||||
@ -110,7 +110,3 @@ When all members are upgraded, you will see the cluster is upgraded to 2.1 succe
|
||||
$ curl http://127.0.0.1:4001/version
|
||||
{"etcdserver":"2.1.x","etcdcluster":"2.1.0"}
|
||||
```
|
||||
|
||||
[auth]: auth_api.md
|
||||
[backup-datastore]: admin_guide.md#backing-up-the-datastore
|
||||
[v2.0]: https://github.com/coreos/etcd/releases/tag/v2.0.13
|
||||
|
@ -1,4 +1,4 @@
|
||||
# Upgrade etcd from 2.1 to 2.2
|
||||
## Upgrade etcd from 2.1 to 2.2
|
||||
|
||||
In the general case, upgrading from etcd 2.1 to 2.2 can be a zero-downtime, rolling upgrade:
|
||||
|
||||
@ -7,37 +7,37 @@ In the general case, upgrading from etcd 2.1 to 2.2 can be a zero-downtime, roll
|
||||
|
||||
Before [starting an upgrade](#upgrade-procedure), read through the rest of this guide to prepare.
|
||||
|
||||
## Upgrade Checklists
|
||||
### Upgrade Checklists
|
||||
|
||||
### Upgrade Requirement
|
||||
#### Upgrade Requirement
|
||||
|
||||
To upgrade an existing etcd deployment to 2.2, you must be running 2.1. If you’re running a version of etcd before 2.1, you must upgrade to [2.1][v2.1] before upgrading to 2.2.
|
||||
To upgrade an existing etcd deployment to 2.2, you must be running 2.1. If you’re running a version of etcd before 2.1, you must upgrade to [2.1](https://github.com/coreos/etcd/releases/tag/v2.1.2) before upgrading to 2.2.
|
||||
|
||||
Also, to ensure a smooth rolling upgrade, your running cluster must be healthy. You can check the health of the cluster by using `etcdctl cluster-health` command.
|
||||
|
||||
### Preparedness
|
||||
#### Preparedness
|
||||
|
||||
Before upgrading etcd, always test the services relying on etcd in a staging environment before deploying the upgrade to the production environment.
|
||||
|
||||
You might also want to [backup the data directory][backup-datastore] for a potential [downgrade].
|
||||
You might also want to [backup your data directory](admin_guide.md#backing-up-the-datastore) for a potential [downgrade](#downgrade).
|
||||
|
||||
### Mixed Versions
|
||||
#### Mixed Versions
|
||||
|
||||
While upgrading, an etcd cluster supports mixed versions of etcd members. The cluster is only considered upgraded once all its members are upgraded to 2.2.
|
||||
|
||||
Internally, etcd members negotiate with each other to determine the overall etcd cluster version, which controls the reported cluster version and the supported features.
|
||||
|
||||
### Limitations
|
||||
#### Limitations
|
||||
|
||||
If you have a data size larger than 100MB you should contact us before upgrading, so we can make sure the upgrades work smoothly.
|
||||
|
||||
Every etcd 2.2 member will do health checking across the cluster periodically. etcd 2.1 member does not support health checking. During the upgrade, etcd 2.2 member will log warning about the unhealthy state of etcd 2.1 member. You can ignore the warning.
|
||||
|
||||
### Downgrade
|
||||
#### Downgrade
|
||||
|
||||
If all members have been upgraded to v2.2, the cluster will be upgraded to v2.2, and downgrade is **not possible**. If any member is still v2.1, the cluster will remain in v2.1, and you can go back to use v2.1 binary.
|
||||
|
||||
Please [backup the data directory][backup-datastore] of all etcd members if you want to downgrade the cluster, even if it is upgraded.
|
||||
Please [backup your data directory](admin_guide.md#backing-up-the-datastore) of all etcd members if you want to downgrade the cluster, even if it is upgraded.
|
||||
|
||||
### Upgrade Procedure
|
||||
|
||||
@ -85,7 +85,7 @@ You will also see logging output like this from the newly upgraded member, since
|
||||
|
||||
```
|
||||
|
||||
[Backup your data directory][backup-datastore] for data safety.
|
||||
You could [backup your data directory](https://github.com/coreos/etcd/blob/7f7e2cc79d9c5c342a6eb1e48c386b0223cf934e/Documentation/admin_guide.md#backing-up-the-datastore) for data safety.
|
||||
|
||||
```
|
||||
$ etcdctl backup \
|
||||
@ -126,7 +126,3 @@ When all members are upgraded, you will see the cluster is upgraded to 2.2 succe
|
||||
$ curl http://127.0.0.1:4001/version
|
||||
{"etcdserver":"2.2.x","etcdcluster":"2.2.0"}
|
||||
```
|
||||
|
||||
[backup-datastore]: admin_guide.md#backing-up-the-datastore
|
||||
[downgrade]: #downgrade
|
||||
[v2.1]: https://github.com/coreos/etcd/releases/tag/v2.1.2
|
||||
|
@ -1,121 +0,0 @@
|
||||
## Upgrade etcd from 2.2 to 2.3
|
||||
|
||||
In the general case, upgrading from etcd 2.2 to 2.3 can be a zero-downtime, rolling upgrade:
|
||||
- one by one, stop the etcd v2.2 processes and replace them with etcd v2.3 processes
|
||||
- after running all v2.3 processes, new features in v2.3 are available to the cluster
|
||||
|
||||
Before [starting an upgrade](#upgrade-procedure), read through the rest of this guide to prepare.
|
||||
|
||||
### Upgrade Checklists
|
||||
|
||||
#### Upgrade Requirements
|
||||
|
||||
To upgrade an existing etcd deployment to 2.3, the running cluster must be 2.2 or greater. If it's before 2.2, please upgrade to [2.2](https://github.com/coreos/etcd/releases/tag/v2.2.0) before upgrading to 2.3.
|
||||
|
||||
Also, to ensure a smooth rolling upgrade, the running cluster must be healthy. You can check the health of the cluster by using the `etcdctl cluster-health` command.
|
||||
|
||||
#### Preparation
|
||||
|
||||
Before upgrading etcd, always test the services relying on etcd in a staging environment before deploying the upgrade to the production environment.
|
||||
|
||||
Before beginning, [backup the etcd data directory](admin_guide.md#backing-up-the-datastore). Should something go wrong with the upgrade, it is possible to use this backup to[downgrade](#downgrade) back to existing etcd version.
|
||||
|
||||
#### Mixed Versions
|
||||
|
||||
While upgrading, an etcd cluster supports mixed versions of etcd members, and operates with the protocol of the lowest common version. The cluster is only considered upgraded once all of its members are upgraded to version 2.3. Internally, etcd members negotiate with each other to determine the overall cluster version, which controls the reported version and the supported features.
|
||||
|
||||
#### Limitations
|
||||
|
||||
It might take up to 2 minutes for the newly upgraded member to catch up with the existing cluster when the total data size is larger than 50MB. Check the size of a recent snapshot to estimate the total data size. In other words, it is safest to wait for 2 minutes between upgrading each member.
|
||||
|
||||
For a much larger total data size, 100MB or more , this one-time process might take even more time. Administrators of very large etcd clusters of this magnitude can feel free to contact the [etcd team][etcd-contact] before upgrading, and we’ll be happy to provide advice on the procedure.
|
||||
|
||||
#### Downgrade
|
||||
|
||||
If all members have been upgraded to v2.3, the cluster will be upgraded to v2.3, and downgrade from this completed state is **not possible**. If any single member is still v2.2, however, the cluster and its operations remains “v2.2”, and it is possible from this mixed cluster state to return to using a v2.2 etcd binary on all members.
|
||||
|
||||
Please [backup the data directory](admin_guide.md#backing-up-the-datastore) of all etcd members to make downgrading the cluster possible even after it has been completely upgraded.
|
||||
|
||||
### Upgrade Procedure
|
||||
|
||||
|
||||
This example details the upgrade of a three-member v2.2 ectd cluster running on a local machine.
|
||||
|
||||
#### 1. Check upgrade requirements.
|
||||
|
||||
Is the the cluster healthy and running v.2.2.x?
|
||||
|
||||
```
|
||||
$ etcdctl cluster-health
|
||||
member 6e3bd23ae5f1eae0 is healthy: got healthy result from http://localhost:22379
|
||||
member 924e2e83e93f2560 is healthy: got healthy result from http://localhost:32379
|
||||
member a8266ecf031671f3 is healthy: got healthy result from http://localhost:12379
|
||||
cluster is healthy
|
||||
|
||||
$ curl http://localhost:4001/version
|
||||
{"etcdserver":"2.2.x","etcdcluster":"2.2.0"}
|
||||
```
|
||||
|
||||
#### 2. Stop the existing etcd process
|
||||
|
||||
When each etcd process is stopped, expected errors will be logged by other cluster members. This is normal since a cluster member connection has been (temporarily) broken:
|
||||
|
||||
```
|
||||
2016-03-11 09:50:49.860319 E | rafthttp: failed to read 8211f1d0f64f3269 on stream Message (unexpected EOF)
|
||||
2016-03-11 09:50:49.860335 I | rafthttp: the connection with 8211f1d0f64f3269 became inactive
|
||||
2016-03-11 09:50:51.023804 W | etcdserver: failed to reach the peerURL(http://127.0.0.1:12380) of member 8211f1d0f64f3269 (Get http://127.0.0.1:12380/version: dial tcp 127.0.0.1:12380: getsockopt: connection refused)
|
||||
2016-03-11 09:50:51.023821 W | etcdserver: cannot get the version of member 8211f1d0f64f3269 (Get http://127.0.0.1:12380/version: dial tcp 127.0.0.1:12380: getsockopt: connection refused)
|
||||
```
|
||||
|
||||
It’s a good idea at this point to [backup the etcd data directory](https://github.com/coreos/etcd/blob/7f7e2cc79d9c5c342a6eb1e48c386b0223cf934e/Documentation/admin_guide.md#backing-up-the-datastore) to provide a downgrade path should any problems occur:
|
||||
|
||||
```
|
||||
$ etcdctl backup \
|
||||
--data-dir /var/lib/etcd \
|
||||
--backup-dir /tmp/etcd_backup
|
||||
```
|
||||
|
||||
#### 3. Drop-in etcd v2.3 binary and start the new etcd process
|
||||
|
||||
The new v2.3 etcd will publish its information to the cluster:
|
||||
|
||||
```
|
||||
09:58:25.938673 I | etcdserver: published {Name:infra1 ClientURLs:[http://localhost:12379]} to cluster 524400597fb1d5f6
|
||||
```
|
||||
|
||||
Verify that each member, and then the entire cluster, becomes healthy with the new v2.3 etcd binary:
|
||||
|
||||
```
|
||||
$ etcdctl cluster-health
|
||||
member 6e3bd23ae5f1eae0 is healthy: got healthy result from http://localhost:22379
|
||||
member 924e2e83e93f2560 is healthy: got healthy result from http://localhost:32379
|
||||
member a8266ecf031671f3 is healthy: got healthy result from http://localhost:12379
|
||||
cluster is healthy
|
||||
```
|
||||
|
||||
|
||||
Upgraded members will log warnings like the following until the entire cluster is upgraded. This is expected and will cease after all etcd cluster members are upgraded to v2.3:
|
||||
|
||||
```
|
||||
2016-03-11 09:58:26.851837 W | etcdserver: the local etcd version 2.2.0 is not up-to-date
|
||||
2016-03-11 09:58:26.851854 W | etcdserver: member c02c70ede158499f has a higher version 2.3.0
|
||||
```
|
||||
|
||||
#### 4. Repeat step 2 to step 3 for all other members
|
||||
|
||||
#### 5. Finish
|
||||
|
||||
When all members are upgraded, the cluster will report upgrading to 2.3 successfully:
|
||||
|
||||
```
|
||||
2016-03-11 10:03:01.583392 N | etcdserver: updated the cluster version from 2.2 to 2.3
|
||||
```
|
||||
|
||||
```
|
||||
$ curl http://127.0.0.1:4001/version
|
||||
{"etcdserver":"2.3.x","etcdcluster":"2.3.0"}
|
||||
```
|
||||
|
||||
|
||||
[etcd-contact]: https://coreos.com/etcd/?
|
||||
|
103
Godeps/Godeps.json
generated
103
Godeps/Godeps.json
generated
@ -1,6 +1,6 @@
|
||||
{
|
||||
"ImportPath": "github.com/coreos/etcd",
|
||||
"GoVersion": "go1.5.1",
|
||||
"GoVersion": "go1.4.2",
|
||||
"Packages": [
|
||||
"./..."
|
||||
],
|
||||
@ -10,10 +10,6 @@
|
||||
"Comment": "null-5",
|
||||
"Rev": "75cd24fc2f2c2a2088577d12123ddee5f54e0675"
|
||||
},
|
||||
{
|
||||
"ImportPath": "github.com/akrennmair/gopcap",
|
||||
"Rev": "00e11033259acb75598ba416495bb708d864a010"
|
||||
},
|
||||
{
|
||||
"ImportPath": "github.com/beorn7/perks/quantile",
|
||||
"Rev": "b965b613227fddccbfffe13eae360ed3fa822f8d"
|
||||
@ -24,21 +20,17 @@
|
||||
},
|
||||
{
|
||||
"ImportPath": "github.com/boltdb/bolt",
|
||||
"Comment": "v1.1.0-81-g0fd4c05",
|
||||
"Rev": "0fd4c0547d204c7b1cad6db6f3adad5f2cf453e5"
|
||||
"Comment": "v1.0-119-g90fef38",
|
||||
"Rev": "90fef389f98027ca55594edd7dbd6e7f3926fdad"
|
||||
},
|
||||
{
|
||||
"ImportPath": "github.com/cheggaaa/pb",
|
||||
"Rev": "da1f27ad1d9509b16f65f52fd9d8138b0f2dc7b2"
|
||||
"ImportPath": "github.com/bradfitz/http2",
|
||||
"Rev": "3e36af6d3af0e56fa3da71099f864933dea3d9fb"
|
||||
},
|
||||
{
|
||||
"ImportPath": "github.com/codegangsta/cli",
|
||||
"Comment": "1.2.0-183-gb5232bb",
|
||||
"Rev": "b5232bb2934f606f9f27a1305f1eea224e8e8b88"
|
||||
},
|
||||
{
|
||||
"ImportPath": "github.com/coreos/gexpect",
|
||||
"Rev": "5173270e159f5aa8fbc999dc7e3dcb50f4098a69"
|
||||
"Comment": "1.2.0-26-gf7ebb76",
|
||||
"Rev": "f7ebb761e83e21225d1d8954fde853bf8edd46c4"
|
||||
},
|
||||
{
|
||||
"ImportPath": "github.com/coreos/go-semver/semver",
|
||||
@ -63,15 +55,9 @@
|
||||
"ImportPath": "github.com/coreos/pkg/capnslog",
|
||||
"Rev": "2c77715c4df99b5420ffcae14ead08f52104065d"
|
||||
},
|
||||
{
|
||||
"ImportPath": "github.com/cpuguy83/go-md2man/md2man",
|
||||
"Comment": "v1.0.4",
|
||||
"Rev": "71acacd42f85e5e82f70a55327789582a5200a90"
|
||||
},
|
||||
{
|
||||
"ImportPath": "github.com/gogo/protobuf/proto",
|
||||
"Comment": "v0.1-118-ge8904f5",
|
||||
"Rev": "e8904f58e872a473a5b91bc9bf3377d223555263"
|
||||
"Rev": "64f27bf06efee53589314a6e5a4af34cdd85adf6"
|
||||
},
|
||||
{
|
||||
"ImportPath": "github.com/golang/glog",
|
||||
@ -79,46 +65,20 @@
|
||||
},
|
||||
{
|
||||
"ImportPath": "github.com/golang/protobuf/proto",
|
||||
"Rev": "6aaa8d47701fa6cf07e914ec01fde3d4a1fe79c3"
|
||||
"Rev": "5677a0e3d5e89854c9974e1256839ee23f8233ca"
|
||||
},
|
||||
{
|
||||
"ImportPath": "github.com/google/btree",
|
||||
"Rev": "cc6329d4279e3f025a53a83c397d2339b5705c45"
|
||||
},
|
||||
{
|
||||
"ImportPath": "github.com/inconshreveable/mousetrap",
|
||||
"Rev": "76626ae9c91c4f2a10f34cad8ce83ea42c93bb75"
|
||||
},
|
||||
{
|
||||
"ImportPath": "github.com/jonboulle/clockwork",
|
||||
"Rev": "72f9bd7c4e0c2a40055ab3d0f09654f730cce982"
|
||||
},
|
||||
{
|
||||
"ImportPath": "github.com/kballard/go-shellquote",
|
||||
"Rev": "d8ec1a69a250a17bb0e419c386eac1f3711dc142"
|
||||
},
|
||||
{
|
||||
"ImportPath": "github.com/kr/pty",
|
||||
"Comment": "release.r56-29-gf7ee69f",
|
||||
"Rev": "f7ee69f31298ecbe5d2b349c711e2547a617d398"
|
||||
},
|
||||
{
|
||||
"ImportPath": "github.com/mattn/go-runewidth",
|
||||
"Comment": "travisish-46-gd6bea18",
|
||||
"Rev": "d6bea18f789704b5f83375793155289da36a3c7f"
|
||||
},
|
||||
{
|
||||
"ImportPath": "github.com/matttproud/golang_protobuf_extensions/pbutil",
|
||||
"Rev": "fc2b8d3a73c4867e51861bbdd5ae3c1f0869dd6a"
|
||||
},
|
||||
{
|
||||
"ImportPath": "github.com/olekukonko/tablewriter",
|
||||
"Rev": "cca8bbc0798408af109aaaa239cbd2634846b340"
|
||||
},
|
||||
{
|
||||
"ImportPath": "github.com/olekukonko/ts",
|
||||
"Rev": "ecf753e7c962639ab5a1fb46f7da627d4c0a04b8"
|
||||
},
|
||||
{
|
||||
"ImportPath": "github.com/prometheus/client_golang/prometheus",
|
||||
"Comment": "0.7.0-52-ge51041b",
|
||||
@ -142,25 +102,8 @@
|
||||
"Rev": "454a56f35412459b5e684fd5ec0f9211b94f002a"
|
||||
},
|
||||
{
|
||||
"ImportPath": "github.com/russross/blackfriday",
|
||||
"Comment": "v1.4-2-g300106c",
|
||||
"Rev": "300106c228d52c8941d4b3de6054a6062a86dda3"
|
||||
},
|
||||
{
|
||||
"ImportPath": "github.com/shurcooL/sanitized_anchor_name",
|
||||
"Rev": "10ef21a441db47d8b13ebcc5fd2310f636973c77"
|
||||
},
|
||||
{
|
||||
"ImportPath": "github.com/spacejam/loghisto",
|
||||
"Rev": "323309774dec8b7430187e46cd0793974ccca04a"
|
||||
},
|
||||
{
|
||||
"ImportPath": "github.com/spf13/cobra",
|
||||
"Rev": "1c44ec8d3f1552cac48999f9306da23c4d8a288b"
|
||||
},
|
||||
{
|
||||
"ImportPath": "github.com/spf13/pflag",
|
||||
"Rev": "08b1a584251b5b62f458943640fc8ebd4d50aaa5"
|
||||
"ImportPath": "github.com/rakyll/pb",
|
||||
"Rev": "dc507ad06b7462501281bb4691ee43f0b1d1ec37"
|
||||
},
|
||||
{
|
||||
"ImportPath": "github.com/stretchr/testify/assert",
|
||||
@ -184,27 +127,31 @@
|
||||
},
|
||||
{
|
||||
"ImportPath": "golang.org/x/net/context",
|
||||
"Rev": "6acef71eb69611914f7a30939ea9f6e194c78172"
|
||||
"Rev": "7dbad50ab5b31073856416cdcfeb2796d682f844"
|
||||
},
|
||||
{
|
||||
"ImportPath": "golang.org/x/net/http2",
|
||||
"Rev": "6acef71eb69611914f7a30939ea9f6e194c78172"
|
||||
"ImportPath": "golang.org/x/net/netutil",
|
||||
"Rev": "7dbad50ab5b31073856416cdcfeb2796d682f844"
|
||||
},
|
||||
{
|
||||
"ImportPath": "golang.org/x/net/internal/timeseries",
|
||||
"Rev": "6acef71eb69611914f7a30939ea9f6e194c78172"
|
||||
},
|
||||
{
|
||||
"ImportPath": "golang.org/x/net/trace",
|
||||
"Rev": "6acef71eb69611914f7a30939ea9f6e194c78172"
|
||||
"ImportPath": "golang.org/x/oauth2",
|
||||
"Rev": "3046bc76d6dfd7d3707f6640f85e42d9c4050f50"
|
||||
},
|
||||
{
|
||||
"ImportPath": "golang.org/x/sys/unix",
|
||||
"Rev": "9c60d1c508f5134d1ca726b4641db998f2523357"
|
||||
},
|
||||
{
|
||||
"ImportPath": "google.golang.org/cloud/compute/metadata",
|
||||
"Rev": "f20d6dcccb44ed49de45ae3703312cb46e627db1"
|
||||
},
|
||||
{
|
||||
"ImportPath": "google.golang.org/cloud/internal",
|
||||
"Rev": "f20d6dcccb44ed49de45ae3703312cb46e627db1"
|
||||
},
|
||||
{
|
||||
"ImportPath": "google.golang.org/grpc",
|
||||
"Rev": "b88c12e7caf74af3928de99a864aaa9916fa5aad"
|
||||
"Rev": "f5ebd86be717593ab029545492c93ddf8914832b"
|
||||
}
|
||||
]
|
||||
}
|
||||
|
5
Godeps/_workspace/src/github.com/akrennmair/gopcap/.gitignore
generated
vendored
5
Godeps/_workspace/src/github.com/akrennmair/gopcap/.gitignore
generated
vendored
@ -1,5 +0,0 @@
|
||||
#*
|
||||
*~
|
||||
/tools/pass/pass
|
||||
/tools/pcaptest/pcaptest
|
||||
/tools/tcpdump/tcpdump
|
27
Godeps/_workspace/src/github.com/akrennmair/gopcap/LICENSE
generated
vendored
27
Godeps/_workspace/src/github.com/akrennmair/gopcap/LICENSE
generated
vendored
@ -1,27 +0,0 @@
|
||||
Copyright (c) 2009-2011 Andreas Krennmair. All rights reserved.
|
||||
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions are
|
||||
met:
|
||||
|
||||
* Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
* Redistributions in binary form must reproduce the above
|
||||
copyright notice, this list of conditions and the following disclaimer
|
||||
in the documentation and/or other materials provided with the
|
||||
distribution.
|
||||
* Neither the name of Andreas Krennmair nor the names of its
|
||||
contributors may be used to endorse or promote products derived from
|
||||
this software without specific prior written permission.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
||||
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
||||
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
||||
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
|
||||
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
|
||||
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
|
||||
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
|
||||
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
|
||||
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
11
Godeps/_workspace/src/github.com/akrennmair/gopcap/README.mkd
generated
vendored
11
Godeps/_workspace/src/github.com/akrennmair/gopcap/README.mkd
generated
vendored
@ -1,11 +0,0 @@
|
||||
# PCAP
|
||||
|
||||
This is a simple wrapper around libpcap for Go. Originally written by Andreas
|
||||
Krennmair <ak@synflood.at> and only minorly touched up by Mark Smith <mark@qq.is>.
|
||||
|
||||
Please see the included pcaptest.go and tcpdump.go programs for instructions on
|
||||
how to use this library.
|
||||
|
||||
Miek Gieben <miek@miek.nl> has created a more Go-like package and replaced functionality
|
||||
with standard functions from the standard library. The package has also been renamed to
|
||||
pcap.
|
527
Godeps/_workspace/src/github.com/akrennmair/gopcap/decode.go
generated
vendored
527
Godeps/_workspace/src/github.com/akrennmair/gopcap/decode.go
generated
vendored
@ -1,527 +0,0 @@
|
||||
package pcap
|
||||
|
||||
import (
|
||||
"encoding/binary"
|
||||
"fmt"
|
||||
"net"
|
||||
"reflect"
|
||||
"strings"
|
||||
)
|
||||
|
||||
const (
|
||||
TYPE_IP = 0x0800
|
||||
TYPE_ARP = 0x0806
|
||||
TYPE_IP6 = 0x86DD
|
||||
TYPE_VLAN = 0x8100
|
||||
|
||||
IP_ICMP = 1
|
||||
IP_INIP = 4
|
||||
IP_TCP = 6
|
||||
IP_UDP = 17
|
||||
)
|
||||
|
||||
const (
|
||||
ERRBUF_SIZE = 256
|
||||
|
||||
// According to pcap-linktype(7).
|
||||
LINKTYPE_NULL = 0
|
||||
LINKTYPE_ETHERNET = 1
|
||||
LINKTYPE_TOKEN_RING = 6
|
||||
LINKTYPE_ARCNET = 7
|
||||
LINKTYPE_SLIP = 8
|
||||
LINKTYPE_PPP = 9
|
||||
LINKTYPE_FDDI = 10
|
||||
LINKTYPE_ATM_RFC1483 = 100
|
||||
LINKTYPE_RAW = 101
|
||||
LINKTYPE_PPP_HDLC = 50
|
||||
LINKTYPE_PPP_ETHER = 51
|
||||
LINKTYPE_C_HDLC = 104
|
||||
LINKTYPE_IEEE802_11 = 105
|
||||
LINKTYPE_FRELAY = 107
|
||||
LINKTYPE_LOOP = 108
|
||||
LINKTYPE_LINUX_SLL = 113
|
||||
LINKTYPE_LTALK = 104
|
||||
LINKTYPE_PFLOG = 117
|
||||
LINKTYPE_PRISM_HEADER = 119
|
||||
LINKTYPE_IP_OVER_FC = 122
|
||||
LINKTYPE_SUNATM = 123
|
||||
LINKTYPE_IEEE802_11_RADIO = 127
|
||||
LINKTYPE_ARCNET_LINUX = 129
|
||||
LINKTYPE_LINUX_IRDA = 144
|
||||
LINKTYPE_LINUX_LAPD = 177
|
||||
)
|
||||
|
||||
type addrHdr interface {
|
||||
SrcAddr() string
|
||||
DestAddr() string
|
||||
Len() int
|
||||
}
|
||||
|
||||
type addrStringer interface {
|
||||
String(addr addrHdr) string
|
||||
}
|
||||
|
||||
func decodemac(pkt []byte) uint64 {
|
||||
mac := uint64(0)
|
||||
for i := uint(0); i < 6; i++ {
|
||||
mac = (mac << 8) + uint64(pkt[i])
|
||||
}
|
||||
return mac
|
||||
}
|
||||
|
||||
// Decode decodes the headers of a Packet.
|
||||
func (p *Packet) Decode() {
|
||||
if len(p.Data) <= 14 {
|
||||
return
|
||||
}
|
||||
|
||||
p.Type = int(binary.BigEndian.Uint16(p.Data[12:14]))
|
||||
p.DestMac = decodemac(p.Data[0:6])
|
||||
p.SrcMac = decodemac(p.Data[6:12])
|
||||
|
||||
if len(p.Data) >= 15 {
|
||||
p.Payload = p.Data[14:]
|
||||
}
|
||||
|
||||
switch p.Type {
|
||||
case TYPE_IP:
|
||||
p.decodeIp()
|
||||
case TYPE_IP6:
|
||||
p.decodeIp6()
|
||||
case TYPE_ARP:
|
||||
p.decodeArp()
|
||||
case TYPE_VLAN:
|
||||
p.decodeVlan()
|
||||
}
|
||||
}
|
||||
|
||||
func (p *Packet) headerString(headers []interface{}) string {
|
||||
// If there's just one header, return that.
|
||||
if len(headers) == 1 {
|
||||
if hdr, ok := headers[0].(fmt.Stringer); ok {
|
||||
return hdr.String()
|
||||
}
|
||||
}
|
||||
// If there are two headers (IPv4/IPv6 -> TCP/UDP/IP..)
|
||||
if len(headers) == 2 {
|
||||
// Commonly the first header is an address.
|
||||
if addr, ok := p.Headers[0].(addrHdr); ok {
|
||||
if hdr, ok := p.Headers[1].(addrStringer); ok {
|
||||
return fmt.Sprintf("%s %s", p.Time, hdr.String(addr))
|
||||
}
|
||||
}
|
||||
}
|
||||
// For IP in IP, we do a recursive call.
|
||||
if len(headers) >= 2 {
|
||||
if addr, ok := headers[0].(addrHdr); ok {
|
||||
if _, ok := headers[1].(addrHdr); ok {
|
||||
return fmt.Sprintf("%s > %s IP in IP: ",
|
||||
addr.SrcAddr(), addr.DestAddr(), p.headerString(headers[1:]))
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
var typeNames []string
|
||||
for _, hdr := range headers {
|
||||
typeNames = append(typeNames, reflect.TypeOf(hdr).String())
|
||||
}
|
||||
|
||||
return fmt.Sprintf("unknown [%s]", strings.Join(typeNames, ","))
|
||||
}
|
||||
|
||||
// String prints a one-line representation of the packet header.
|
||||
// The output is suitable for use in a tcpdump program.
|
||||
func (p *Packet) String() string {
|
||||
// If there are no headers, print "unsupported protocol".
|
||||
if len(p.Headers) == 0 {
|
||||
return fmt.Sprintf("%s unsupported protocol %d", p.Time, int(p.Type))
|
||||
}
|
||||
return fmt.Sprintf("%s %s", p.Time, p.headerString(p.Headers))
|
||||
}
|
||||
|
||||
// Arphdr is a ARP packet header.
|
||||
type Arphdr struct {
|
||||
Addrtype uint16
|
||||
Protocol uint16
|
||||
HwAddressSize uint8
|
||||
ProtAddressSize uint8
|
||||
Operation uint16
|
||||
SourceHwAddress []byte
|
||||
SourceProtAddress []byte
|
||||
DestHwAddress []byte
|
||||
DestProtAddress []byte
|
||||
}
|
||||
|
||||
func (arp *Arphdr) String() (s string) {
|
||||
switch arp.Operation {
|
||||
case 1:
|
||||
s = "ARP request"
|
||||
case 2:
|
||||
s = "ARP Reply"
|
||||
}
|
||||
if arp.Addrtype == LINKTYPE_ETHERNET && arp.Protocol == TYPE_IP {
|
||||
s = fmt.Sprintf("%012x (%s) > %012x (%s)",
|
||||
decodemac(arp.SourceHwAddress), arp.SourceProtAddress,
|
||||
decodemac(arp.DestHwAddress), arp.DestProtAddress)
|
||||
} else {
|
||||
s = fmt.Sprintf("addrtype = %d protocol = %d", arp.Addrtype, arp.Protocol)
|
||||
}
|
||||
return
|
||||
}
|
||||
|
||||
func (p *Packet) decodeArp() {
|
||||
if len(p.Payload) < 8 {
|
||||
return
|
||||
}
|
||||
|
||||
pkt := p.Payload
|
||||
arp := new(Arphdr)
|
||||
arp.Addrtype = binary.BigEndian.Uint16(pkt[0:2])
|
||||
arp.Protocol = binary.BigEndian.Uint16(pkt[2:4])
|
||||
arp.HwAddressSize = pkt[4]
|
||||
arp.ProtAddressSize = pkt[5]
|
||||
arp.Operation = binary.BigEndian.Uint16(pkt[6:8])
|
||||
|
||||
if len(pkt) < int(8+2*arp.HwAddressSize+2*arp.ProtAddressSize) {
|
||||
return
|
||||
}
|
||||
arp.SourceHwAddress = pkt[8 : 8+arp.HwAddressSize]
|
||||
arp.SourceProtAddress = pkt[8+arp.HwAddressSize : 8+arp.HwAddressSize+arp.ProtAddressSize]
|
||||
arp.DestHwAddress = pkt[8+arp.HwAddressSize+arp.ProtAddressSize : 8+2*arp.HwAddressSize+arp.ProtAddressSize]
|
||||
arp.DestProtAddress = pkt[8+2*arp.HwAddressSize+arp.ProtAddressSize : 8+2*arp.HwAddressSize+2*arp.ProtAddressSize]
|
||||
|
||||
p.Headers = append(p.Headers, arp)
|
||||
|
||||
if len(pkt) >= int(8+2*arp.HwAddressSize+2*arp.ProtAddressSize) {
|
||||
p.Payload = p.Payload[8+2*arp.HwAddressSize+2*arp.ProtAddressSize:]
|
||||
}
|
||||
}
|
||||
|
||||
// IPadr is the header of an IP packet.
|
||||
type Iphdr struct {
|
||||
Version uint8
|
||||
Ihl uint8
|
||||
Tos uint8
|
||||
Length uint16
|
||||
Id uint16
|
||||
Flags uint8
|
||||
FragOffset uint16
|
||||
Ttl uint8
|
||||
Protocol uint8
|
||||
Checksum uint16
|
||||
SrcIp []byte
|
||||
DestIp []byte
|
||||
}
|
||||
|
||||
func (p *Packet) decodeIp() {
|
||||
if len(p.Payload) < 20 {
|
||||
return
|
||||
}
|
||||
|
||||
pkt := p.Payload
|
||||
ip := new(Iphdr)
|
||||
|
||||
ip.Version = uint8(pkt[0]) >> 4
|
||||
ip.Ihl = uint8(pkt[0]) & 0x0F
|
||||
ip.Tos = pkt[1]
|
||||
ip.Length = binary.BigEndian.Uint16(pkt[2:4])
|
||||
ip.Id = binary.BigEndian.Uint16(pkt[4:6])
|
||||
flagsfrags := binary.BigEndian.Uint16(pkt[6:8])
|
||||
ip.Flags = uint8(flagsfrags >> 13)
|
||||
ip.FragOffset = flagsfrags & 0x1FFF
|
||||
ip.Ttl = pkt[8]
|
||||
ip.Protocol = pkt[9]
|
||||
ip.Checksum = binary.BigEndian.Uint16(pkt[10:12])
|
||||
ip.SrcIp = pkt[12:16]
|
||||
ip.DestIp = pkt[16:20]
|
||||
|
||||
pEnd := int(ip.Length)
|
||||
if pEnd > len(pkt) {
|
||||
pEnd = len(pkt)
|
||||
}
|
||||
|
||||
if len(pkt) >= pEnd && int(ip.Ihl*4) < pEnd {
|
||||
p.Payload = pkt[ip.Ihl*4 : pEnd]
|
||||
} else {
|
||||
p.Payload = []byte{}
|
||||
}
|
||||
|
||||
p.Headers = append(p.Headers, ip)
|
||||
p.IP = ip
|
||||
|
||||
switch ip.Protocol {
|
||||
case IP_TCP:
|
||||
p.decodeTcp()
|
||||
case IP_UDP:
|
||||
p.decodeUdp()
|
||||
case IP_ICMP:
|
||||
p.decodeIcmp()
|
||||
case IP_INIP:
|
||||
p.decodeIp()
|
||||
}
|
||||
}
|
||||
|
||||
func (ip *Iphdr) SrcAddr() string { return net.IP(ip.SrcIp).String() }
|
||||
func (ip *Iphdr) DestAddr() string { return net.IP(ip.DestIp).String() }
|
||||
func (ip *Iphdr) Len() int { return int(ip.Length) }
|
||||
|
||||
type Vlanhdr struct {
|
||||
Priority byte
|
||||
DropEligible bool
|
||||
VlanIdentifier int
|
||||
Type int // Not actually part of the vlan header, but the type of the actual packet
|
||||
}
|
||||
|
||||
func (v *Vlanhdr) String() {
|
||||
fmt.Sprintf("VLAN Priority:%d Drop:%v Tag:%d", v.Priority, v.DropEligible, v.VlanIdentifier)
|
||||
}
|
||||
|
||||
func (p *Packet) decodeVlan() {
|
||||
pkt := p.Payload
|
||||
vlan := new(Vlanhdr)
|
||||
if len(pkt) < 4 {
|
||||
return
|
||||
}
|
||||
|
||||
vlan.Priority = (pkt[2] & 0xE0) >> 13
|
||||
vlan.DropEligible = pkt[2]&0x10 != 0
|
||||
vlan.VlanIdentifier = int(binary.BigEndian.Uint16(pkt[:2])) & 0x0FFF
|
||||
vlan.Type = int(binary.BigEndian.Uint16(p.Payload[2:4]))
|
||||
p.Headers = append(p.Headers, vlan)
|
||||
|
||||
if len(pkt) >= 5 {
|
||||
p.Payload = p.Payload[4:]
|
||||
}
|
||||
|
||||
switch vlan.Type {
|
||||
case TYPE_IP:
|
||||
p.decodeIp()
|
||||
case TYPE_IP6:
|
||||
p.decodeIp6()
|
||||
case TYPE_ARP:
|
||||
p.decodeArp()
|
||||
}
|
||||
}
|
||||
|
||||
type Tcphdr struct {
|
||||
SrcPort uint16
|
||||
DestPort uint16
|
||||
Seq uint32
|
||||
Ack uint32
|
||||
DataOffset uint8
|
||||
Flags uint16
|
||||
Window uint16
|
||||
Checksum uint16
|
||||
Urgent uint16
|
||||
Data []byte
|
||||
}
|
||||
|
||||
const (
|
||||
TCP_FIN = 1 << iota
|
||||
TCP_SYN
|
||||
TCP_RST
|
||||
TCP_PSH
|
||||
TCP_ACK
|
||||
TCP_URG
|
||||
TCP_ECE
|
||||
TCP_CWR
|
||||
TCP_NS
|
||||
)
|
||||
|
||||
func (p *Packet) decodeTcp() {
|
||||
if len(p.Payload) < 20 {
|
||||
return
|
||||
}
|
||||
|
||||
pkt := p.Payload
|
||||
tcp := new(Tcphdr)
|
||||
tcp.SrcPort = binary.BigEndian.Uint16(pkt[0:2])
|
||||
tcp.DestPort = binary.BigEndian.Uint16(pkt[2:4])
|
||||
tcp.Seq = binary.BigEndian.Uint32(pkt[4:8])
|
||||
tcp.Ack = binary.BigEndian.Uint32(pkt[8:12])
|
||||
tcp.DataOffset = (pkt[12] & 0xF0) >> 4
|
||||
tcp.Flags = binary.BigEndian.Uint16(pkt[12:14]) & 0x1FF
|
||||
tcp.Window = binary.BigEndian.Uint16(pkt[14:16])
|
||||
tcp.Checksum = binary.BigEndian.Uint16(pkt[16:18])
|
||||
tcp.Urgent = binary.BigEndian.Uint16(pkt[18:20])
|
||||
if len(pkt) >= int(tcp.DataOffset*4) {
|
||||
p.Payload = pkt[tcp.DataOffset*4:]
|
||||
}
|
||||
p.Headers = append(p.Headers, tcp)
|
||||
p.TCP = tcp
|
||||
}
|
||||
|
||||
func (tcp *Tcphdr) String(hdr addrHdr) string {
|
||||
return fmt.Sprintf("TCP %s:%d > %s:%d %s SEQ=%d ACK=%d LEN=%d",
|
||||
hdr.SrcAddr(), int(tcp.SrcPort), hdr.DestAddr(), int(tcp.DestPort),
|
||||
tcp.FlagsString(), int64(tcp.Seq), int64(tcp.Ack), hdr.Len())
|
||||
}
|
||||
|
||||
func (tcp *Tcphdr) FlagsString() string {
|
||||
var sflags []string
|
||||
if 0 != (tcp.Flags & TCP_SYN) {
|
||||
sflags = append(sflags, "syn")
|
||||
}
|
||||
if 0 != (tcp.Flags & TCP_FIN) {
|
||||
sflags = append(sflags, "fin")
|
||||
}
|
||||
if 0 != (tcp.Flags & TCP_ACK) {
|
||||
sflags = append(sflags, "ack")
|
||||
}
|
||||
if 0 != (tcp.Flags & TCP_PSH) {
|
||||
sflags = append(sflags, "psh")
|
||||
}
|
||||
if 0 != (tcp.Flags & TCP_RST) {
|
||||
sflags = append(sflags, "rst")
|
||||
}
|
||||
if 0 != (tcp.Flags & TCP_URG) {
|
||||
sflags = append(sflags, "urg")
|
||||
}
|
||||
if 0 != (tcp.Flags & TCP_NS) {
|
||||
sflags = append(sflags, "ns")
|
||||
}
|
||||
if 0 != (tcp.Flags & TCP_CWR) {
|
||||
sflags = append(sflags, "cwr")
|
||||
}
|
||||
if 0 != (tcp.Flags & TCP_ECE) {
|
||||
sflags = append(sflags, "ece")
|
||||
}
|
||||
return fmt.Sprintf("[%s]", strings.Join(sflags, " "))
|
||||
}
|
||||
|
||||
type Udphdr struct {
|
||||
SrcPort uint16
|
||||
DestPort uint16
|
||||
Length uint16
|
||||
Checksum uint16
|
||||
}
|
||||
|
||||
func (p *Packet) decodeUdp() {
|
||||
if len(p.Payload) < 8 {
|
||||
return
|
||||
}
|
||||
|
||||
pkt := p.Payload
|
||||
udp := new(Udphdr)
|
||||
udp.SrcPort = binary.BigEndian.Uint16(pkt[0:2])
|
||||
udp.DestPort = binary.BigEndian.Uint16(pkt[2:4])
|
||||
udp.Length = binary.BigEndian.Uint16(pkt[4:6])
|
||||
udp.Checksum = binary.BigEndian.Uint16(pkt[6:8])
|
||||
p.Headers = append(p.Headers, udp)
|
||||
p.UDP = udp
|
||||
if len(p.Payload) >= 8 {
|
||||
p.Payload = pkt[8:]
|
||||
}
|
||||
}
|
||||
|
||||
func (udp *Udphdr) String(hdr addrHdr) string {
|
||||
return fmt.Sprintf("UDP %s:%d > %s:%d LEN=%d CHKSUM=%d",
|
||||
hdr.SrcAddr(), int(udp.SrcPort), hdr.DestAddr(), int(udp.DestPort),
|
||||
int(udp.Length), int(udp.Checksum))
|
||||
}
|
||||
|
||||
type Icmphdr struct {
|
||||
Type uint8
|
||||
Code uint8
|
||||
Checksum uint16
|
||||
Id uint16
|
||||
Seq uint16
|
||||
Data []byte
|
||||
}
|
||||
|
||||
func (p *Packet) decodeIcmp() *Icmphdr {
|
||||
if len(p.Payload) < 8 {
|
||||
return nil
|
||||
}
|
||||
|
||||
pkt := p.Payload
|
||||
icmp := new(Icmphdr)
|
||||
icmp.Type = pkt[0]
|
||||
icmp.Code = pkt[1]
|
||||
icmp.Checksum = binary.BigEndian.Uint16(pkt[2:4])
|
||||
icmp.Id = binary.BigEndian.Uint16(pkt[4:6])
|
||||
icmp.Seq = binary.BigEndian.Uint16(pkt[6:8])
|
||||
p.Payload = pkt[8:]
|
||||
p.Headers = append(p.Headers, icmp)
|
||||
return icmp
|
||||
}
|
||||
|
||||
func (icmp *Icmphdr) String(hdr addrHdr) string {
|
||||
return fmt.Sprintf("ICMP %s > %s Type = %d Code = %d ",
|
||||
hdr.SrcAddr(), hdr.DestAddr(), icmp.Type, icmp.Code)
|
||||
}
|
||||
|
||||
func (icmp *Icmphdr) TypeString() (result string) {
|
||||
switch icmp.Type {
|
||||
case 0:
|
||||
result = fmt.Sprintf("Echo reply seq=%d", icmp.Seq)
|
||||
case 3:
|
||||
switch icmp.Code {
|
||||
case 0:
|
||||
result = "Network unreachable"
|
||||
case 1:
|
||||
result = "Host unreachable"
|
||||
case 2:
|
||||
result = "Protocol unreachable"
|
||||
case 3:
|
||||
result = "Port unreachable"
|
||||
default:
|
||||
result = "Destination unreachable"
|
||||
}
|
||||
case 8:
|
||||
result = fmt.Sprintf("Echo request seq=%d", icmp.Seq)
|
||||
case 30:
|
||||
result = "Traceroute"
|
||||
}
|
||||
return
|
||||
}
|
||||
|
||||
type Ip6hdr struct {
|
||||
// http://www.networksorcery.com/enp/protocol/ipv6.htm
|
||||
Version uint8 // 4 bits
|
||||
TrafficClass uint8 // 8 bits
|
||||
FlowLabel uint32 // 20 bits
|
||||
Length uint16 // 16 bits
|
||||
NextHeader uint8 // 8 bits, same as Protocol in Iphdr
|
||||
HopLimit uint8 // 8 bits
|
||||
SrcIp []byte // 16 bytes
|
||||
DestIp []byte // 16 bytes
|
||||
}
|
||||
|
||||
func (p *Packet) decodeIp6() {
|
||||
if len(p.Payload) < 40 {
|
||||
return
|
||||
}
|
||||
|
||||
pkt := p.Payload
|
||||
ip6 := new(Ip6hdr)
|
||||
ip6.Version = uint8(pkt[0]) >> 4
|
||||
ip6.TrafficClass = uint8((binary.BigEndian.Uint16(pkt[0:2]) >> 4) & 0x00FF)
|
||||
ip6.FlowLabel = binary.BigEndian.Uint32(pkt[0:4]) & 0x000FFFFF
|
||||
ip6.Length = binary.BigEndian.Uint16(pkt[4:6])
|
||||
ip6.NextHeader = pkt[6]
|
||||
ip6.HopLimit = pkt[7]
|
||||
ip6.SrcIp = pkt[8:24]
|
||||
ip6.DestIp = pkt[24:40]
|
||||
|
||||
if len(p.Payload) >= 40 {
|
||||
p.Payload = pkt[40:]
|
||||
}
|
||||
|
||||
p.Headers = append(p.Headers, ip6)
|
||||
|
||||
switch ip6.NextHeader {
|
||||
case IP_TCP:
|
||||
p.decodeTcp()
|
||||
case IP_UDP:
|
||||
p.decodeUdp()
|
||||
case IP_ICMP:
|
||||
p.decodeIcmp()
|
||||
case IP_INIP:
|
||||
p.decodeIp()
|
||||
}
|
||||
}
|
||||
|
||||
func (ip6 *Ip6hdr) SrcAddr() string { return net.IP(ip6.SrcIp).String() }
|
||||
func (ip6 *Ip6hdr) DestAddr() string { return net.IP(ip6.DestIp).String() }
|
||||
func (ip6 *Ip6hdr) Len() int { return int(ip6.Length) }
|
247
Godeps/_workspace/src/github.com/akrennmair/gopcap/decode_test.go
generated
vendored
247
Godeps/_workspace/src/github.com/akrennmair/gopcap/decode_test.go
generated
vendored
@ -1,247 +0,0 @@
|
||||
package pcap
|
||||
|
||||
import (
|
||||
"bytes"
|
||||
"testing"
|
||||
"time"
|
||||
)
|
||||
|
||||
var testSimpleTcpPacket *Packet = &Packet{
|
||||
Data: []byte{
|
||||
0x00, 0x00, 0x0c, 0x9f, 0xf0, 0x20, 0xbc, 0x30, 0x5b, 0xe8, 0xd3, 0x49,
|
||||
0x08, 0x00, 0x45, 0x00, 0x01, 0xa4, 0x39, 0xdf, 0x40, 0x00, 0x40, 0x06,
|
||||
0x55, 0x5a, 0xac, 0x11, 0x51, 0x49, 0xad, 0xde, 0xfe, 0xe1, 0xc5, 0xf7,
|
||||
0x00, 0x50, 0xc5, 0x7e, 0x0e, 0x48, 0x49, 0x07, 0x42, 0x32, 0x80, 0x18,
|
||||
0x00, 0x73, 0xab, 0xb1, 0x00, 0x00, 0x01, 0x01, 0x08, 0x0a, 0x03, 0x77,
|
||||
0x37, 0x9c, 0x42, 0x77, 0x5e, 0x3a, 0x47, 0x45, 0x54, 0x20, 0x2f, 0x20,
|
||||
0x48, 0x54, 0x54, 0x50, 0x2f, 0x31, 0x2e, 0x31, 0x0d, 0x0a, 0x48, 0x6f,
|
||||
0x73, 0x74, 0x3a, 0x20, 0x77, 0x77, 0x77, 0x2e, 0x66, 0x69, 0x73, 0x68,
|
||||
0x2e, 0x63, 0x6f, 0x6d, 0x0d, 0x0a, 0x43, 0x6f, 0x6e, 0x6e, 0x65, 0x63,
|
||||
0x74, 0x69, 0x6f, 0x6e, 0x3a, 0x20, 0x6b, 0x65, 0x65, 0x70, 0x2d, 0x61,
|
||||
0x6c, 0x69, 0x76, 0x65, 0x0d, 0x0a, 0x55, 0x73, 0x65, 0x72, 0x2d, 0x41,
|
||||
0x67, 0x65, 0x6e, 0x74, 0x3a, 0x20, 0x4d, 0x6f, 0x7a, 0x69, 0x6c, 0x6c,
|
||||
0x61, 0x2f, 0x35, 0x2e, 0x30, 0x20, 0x28, 0x58, 0x31, 0x31, 0x3b, 0x20,
|
||||
0x4c, 0x69, 0x6e, 0x75, 0x78, 0x20, 0x78, 0x38, 0x36, 0x5f, 0x36, 0x34,
|
||||
0x29, 0x20, 0x41, 0x70, 0x70, 0x6c, 0x65, 0x57, 0x65, 0x62, 0x4b, 0x69,
|
||||
0x74, 0x2f, 0x35, 0x33, 0x35, 0x2e, 0x32, 0x20, 0x28, 0x4b, 0x48, 0x54,
|
||||
0x4d, 0x4c, 0x2c, 0x20, 0x6c, 0x69, 0x6b, 0x65, 0x20, 0x47, 0x65, 0x63,
|
||||
0x6b, 0x6f, 0x29, 0x20, 0x43, 0x68, 0x72, 0x6f, 0x6d, 0x65, 0x2f, 0x31,
|
||||
0x35, 0x2e, 0x30, 0x2e, 0x38, 0x37, 0x34, 0x2e, 0x31, 0x32, 0x31, 0x20,
|
||||
0x53, 0x61, 0x66, 0x61, 0x72, 0x69, 0x2f, 0x35, 0x33, 0x35, 0x2e, 0x32,
|
||||
0x0d, 0x0a, 0x41, 0x63, 0x63, 0x65, 0x70, 0x74, 0x3a, 0x20, 0x74, 0x65,
|
||||
0x78, 0x74, 0x2f, 0x68, 0x74, 0x6d, 0x6c, 0x2c, 0x61, 0x70, 0x70, 0x6c,
|
||||
0x69, 0x63, 0x61, 0x74, 0x69, 0x6f, 0x6e, 0x2f, 0x78, 0x68, 0x74, 0x6d,
|
||||
0x6c, 0x2b, 0x78, 0x6d, 0x6c, 0x2c, 0x61, 0x70, 0x70, 0x6c, 0x69, 0x63,
|
||||
0x61, 0x74, 0x69, 0x6f, 0x6e, 0x2f, 0x78, 0x6d, 0x6c, 0x3b, 0x71, 0x3d,
|
||||
0x30, 0x2e, 0x39, 0x2c, 0x2a, 0x2f, 0x2a, 0x3b, 0x71, 0x3d, 0x30, 0x2e,
|
||||
0x38, 0x0d, 0x0a, 0x41, 0x63, 0x63, 0x65, 0x70, 0x74, 0x2d, 0x45, 0x6e,
|
||||
0x63, 0x6f, 0x64, 0x69, 0x6e, 0x67, 0x3a, 0x20, 0x67, 0x7a, 0x69, 0x70,
|
||||
0x2c, 0x64, 0x65, 0x66, 0x6c, 0x61, 0x74, 0x65, 0x2c, 0x73, 0x64, 0x63,
|
||||
0x68, 0x0d, 0x0a, 0x41, 0x63, 0x63, 0x65, 0x70, 0x74, 0x2d, 0x4c, 0x61,
|
||||
0x6e, 0x67, 0x75, 0x61, 0x67, 0x65, 0x3a, 0x20, 0x65, 0x6e, 0x2d, 0x55,
|
||||
0x53, 0x2c, 0x65, 0x6e, 0x3b, 0x71, 0x3d, 0x30, 0x2e, 0x38, 0x0d, 0x0a,
|
||||
0x41, 0x63, 0x63, 0x65, 0x70, 0x74, 0x2d, 0x43, 0x68, 0x61, 0x72, 0x73,
|
||||
0x65, 0x74, 0x3a, 0x20, 0x49, 0x53, 0x4f, 0x2d, 0x38, 0x38, 0x35, 0x39,
|
||||
0x2d, 0x31, 0x2c, 0x75, 0x74, 0x66, 0x2d, 0x38, 0x3b, 0x71, 0x3d, 0x30,
|
||||
0x2e, 0x37, 0x2c, 0x2a, 0x3b, 0x71, 0x3d, 0x30, 0x2e, 0x33, 0x0d, 0x0a,
|
||||
0x0d, 0x0a,
|
||||
}}
|
||||
|
||||
func BenchmarkDecodeSimpleTcpPacket(b *testing.B) {
|
||||
for i := 0; i < b.N; i++ {
|
||||
testSimpleTcpPacket.Decode()
|
||||
}
|
||||
}
|
||||
|
||||
func TestDecodeSimpleTcpPacket(t *testing.T) {
|
||||
p := testSimpleTcpPacket
|
||||
p.Decode()
|
||||
if p.DestMac != 0x00000c9ff020 {
|
||||
t.Error("Dest mac", p.DestMac)
|
||||
}
|
||||
if p.SrcMac != 0xbc305be8d349 {
|
||||
t.Error("Src mac", p.SrcMac)
|
||||
}
|
||||
if len(p.Headers) != 2 {
|
||||
t.Error("Incorrect number of headers", len(p.Headers))
|
||||
return
|
||||
}
|
||||
if ip, ipOk := p.Headers[0].(*Iphdr); ipOk {
|
||||
if ip.Version != 4 {
|
||||
t.Error("ip Version", ip.Version)
|
||||
}
|
||||
if ip.Ihl != 5 {
|
||||
t.Error("ip header length", ip.Ihl)
|
||||
}
|
||||
if ip.Tos != 0 {
|
||||
t.Error("ip TOS", ip.Tos)
|
||||
}
|
||||
if ip.Length != 420 {
|
||||
t.Error("ip Length", ip.Length)
|
||||
}
|
||||
if ip.Id != 14815 {
|
||||
t.Error("ip ID", ip.Id)
|
||||
}
|
||||
if ip.Flags != 0x02 {
|
||||
t.Error("ip Flags", ip.Flags)
|
||||
}
|
||||
if ip.FragOffset != 0 {
|
||||
t.Error("ip Fragoffset", ip.FragOffset)
|
||||
}
|
||||
if ip.Ttl != 64 {
|
||||
t.Error("ip TTL", ip.Ttl)
|
||||
}
|
||||
if ip.Protocol != 6 {
|
||||
t.Error("ip Protocol", ip.Protocol)
|
||||
}
|
||||
if ip.Checksum != 0x555A {
|
||||
t.Error("ip Checksum", ip.Checksum)
|
||||
}
|
||||
if !bytes.Equal(ip.SrcIp, []byte{172, 17, 81, 73}) {
|
||||
t.Error("ip Src", ip.SrcIp)
|
||||
}
|
||||
if !bytes.Equal(ip.DestIp, []byte{173, 222, 254, 225}) {
|
||||
t.Error("ip Dest", ip.DestIp)
|
||||
}
|
||||
if tcp, tcpOk := p.Headers[1].(*Tcphdr); tcpOk {
|
||||
if tcp.SrcPort != 50679 {
|
||||
t.Error("tcp srcport", tcp.SrcPort)
|
||||
}
|
||||
if tcp.DestPort != 80 {
|
||||
t.Error("tcp destport", tcp.DestPort)
|
||||
}
|
||||
if tcp.Seq != 0xc57e0e48 {
|
||||
t.Error("tcp seq", tcp.Seq)
|
||||
}
|
||||
if tcp.Ack != 0x49074232 {
|
||||
t.Error("tcp ack", tcp.Ack)
|
||||
}
|
||||
if tcp.DataOffset != 8 {
|
||||
t.Error("tcp dataoffset", tcp.DataOffset)
|
||||
}
|
||||
if tcp.Flags != 0x18 {
|
||||
t.Error("tcp flags", tcp.Flags)
|
||||
}
|
||||
if tcp.Window != 0x73 {
|
||||
t.Error("tcp window", tcp.Window)
|
||||
}
|
||||
if tcp.Checksum != 0xabb1 {
|
||||
t.Error("tcp checksum", tcp.Checksum)
|
||||
}
|
||||
if tcp.Urgent != 0 {
|
||||
t.Error("tcp urgent", tcp.Urgent)
|
||||
}
|
||||
} else {
|
||||
t.Error("Second header is not TCP header")
|
||||
}
|
||||
} else {
|
||||
t.Error("First header is not IP header")
|
||||
}
|
||||
if string(p.Payload) != "GET / HTTP/1.1\r\nHost: www.fish.com\r\nConnection: keep-alive\r\nUser-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.2 (KHTML, like Gecko) Chrome/15.0.874.121 Safari/535.2\r\nAccept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\nAccept-Encoding: gzip,deflate,sdch\r\nAccept-Language: en-US,en;q=0.8\r\nAccept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3\r\n\r\n" {
|
||||
t.Error("--- PAYLOAD STRING ---\n", string(p.Payload), "\n--- PAYLOAD BYTES ---\n", p.Payload)
|
||||
}
|
||||
}
|
||||
|
||||
// Makes sure packet payload doesn't display the 6 trailing null of this packet
|
||||
// as part of the payload. They're actually the ethernet trailer.
|
||||
func TestDecodeSmallTcpPacketHasEmptyPayload(t *testing.T) {
|
||||
p := &Packet{
|
||||
// This packet is only 54 bits (an empty TCP RST), thus 6 trailing null
|
||||
// bytes are added by the ethernet layer to make it the minimum packet size.
|
||||
Data: []byte{
|
||||
0xbc, 0x30, 0x5b, 0xe8, 0xd3, 0x49, 0xb8, 0xac, 0x6f, 0x92, 0xd5, 0xbf,
|
||||
0x08, 0x00, 0x45, 0x00, 0x00, 0x28, 0x00, 0x00, 0x40, 0x00, 0x40, 0x06,
|
||||
0x3f, 0x9f, 0xac, 0x11, 0x51, 0xc5, 0xac, 0x11, 0x51, 0x49, 0x00, 0x63,
|
||||
0x9a, 0xef, 0x00, 0x00, 0x00, 0x00, 0x2e, 0xc1, 0x27, 0x83, 0x50, 0x14,
|
||||
0x00, 0x00, 0xc3, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
|
||||
}}
|
||||
p.Decode()
|
||||
if p.Payload == nil {
|
||||
t.Error("Nil payload")
|
||||
}
|
||||
if len(p.Payload) != 0 {
|
||||
t.Error("Non-empty payload:", p.Payload)
|
||||
}
|
||||
}
|
||||
|
||||
func TestDecodeVlanPacket(t *testing.T) {
|
||||
p := &Packet{
|
||||
Data: []byte{
|
||||
0x00, 0x10, 0xdb, 0xff, 0x10, 0x00, 0x00, 0x15, 0x2c, 0x9d, 0xcc, 0x00, 0x81, 0x00, 0x01, 0xf7,
|
||||
0x08, 0x00, 0x45, 0x00, 0x00, 0x28, 0x29, 0x8d, 0x40, 0x00, 0x7d, 0x06, 0x83, 0xa0, 0xac, 0x1b,
|
||||
0xca, 0x8e, 0x45, 0x16, 0x94, 0xe2, 0xd4, 0x0a, 0x00, 0x50, 0xdf, 0xab, 0x9c, 0xc6, 0xcd, 0x1e,
|
||||
0xe5, 0xd1, 0x50, 0x10, 0x01, 0x00, 0x5a, 0x74, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
|
||||
}}
|
||||
p.Decode()
|
||||
if p.Type != TYPE_VLAN {
|
||||
t.Error("Didn't detect vlan")
|
||||
}
|
||||
if len(p.Headers) != 3 {
|
||||
t.Error("Incorrect number of headers:", len(p.Headers))
|
||||
for i, h := range p.Headers {
|
||||
t.Errorf("Header %d: %#v", i, h)
|
||||
}
|
||||
t.FailNow()
|
||||
}
|
||||
if _, ok := p.Headers[0].(*Vlanhdr); !ok {
|
||||
t.Errorf("First header isn't vlan: %q", p.Headers[0])
|
||||
}
|
||||
if _, ok := p.Headers[1].(*Iphdr); !ok {
|
||||
t.Errorf("Second header isn't IP: %q", p.Headers[1])
|
||||
}
|
||||
if _, ok := p.Headers[2].(*Tcphdr); !ok {
|
||||
t.Errorf("Third header isn't TCP: %q", p.Headers[2])
|
||||
}
|
||||
}
|
||||
|
||||
func TestDecodeFuzzFallout(t *testing.T) {
|
||||
testData := []struct {
|
||||
Data []byte
|
||||
}{
|
||||
{[]byte("000000000000\x81\x000")},
|
||||
{[]byte("000000000000\x81\x00000")},
|
||||
{[]byte("000000000000\x86\xdd0")},
|
||||
{[]byte("000000000000\b\x000")},
|
||||
{[]byte("000000000000\b\x060")},
|
||||
{[]byte{}},
|
||||
{[]byte("000000000000\b\x0600000000")},
|
||||
{[]byte("000000000000\x86\xdd000000\x01000000000000000000000000000000000")},
|
||||
{[]byte("000000000000\x81\x0000\b\x0600000000")},
|
||||
{[]byte("000000000000\b\x00n0000000000000000000")},
|
||||
{[]byte("000000000000\x86\xdd000000\x0100000000000000000000000000000000000")},
|
||||
{[]byte("000000000000\x81\x0000\b\x00g0000000000000000000")},
|
||||
//{[]byte()},
|
||||
{[]byte("000000000000\b\x00400000000\x110000000000")},
|
||||
{[]byte("0nMء\xfe\x13\x13\x81\x00gr\b\x00&x\xc9\xe5b'\x1e0\x00\x04\x00\x0020596224")},
|
||||
{[]byte("000000000000\x81\x0000\b\x00400000000\x110000000000")},
|
||||
{[]byte("000000000000\b\x00000000000\x0600\xff0000000")},
|
||||
{[]byte("000000000000\x86\xdd000000\x06000000000000000000000000000000000")},
|
||||
{[]byte("000000000000\x81\x0000\b\x00000000000\x0600b0000000")},
|
||||
{[]byte("000000000000\x81\x0000\b\x00400000000\x060000000000")},
|
||||
{[]byte("000000000000\x86\xdd000000\x11000000000000000000000000000000000")},
|
||||
{[]byte("000000000000\x86\xdd000000\x0600000000000000000000000000000000000000000000M")},
|
||||
{[]byte("000000000000\b\x00500000000\x0600000000000")},
|
||||
{[]byte("0nM\xd80\xfe\x13\x13\x81\x00gr\b\x00&x\xc9\xe5b'\x1e0\x00\x04\x00\x0020596224")},
|
||||
}
|
||||
|
||||
for _, entry := range testData {
|
||||
pkt := &Packet{
|
||||
Time: time.Now(),
|
||||
Caplen: uint32(len(entry.Data)),
|
||||
Len: uint32(len(entry.Data)),
|
||||
Data: entry.Data,
|
||||
}
|
||||
|
||||
pkt.Decode()
|
||||
/*
|
||||
func() {
|
||||
defer func() {
|
||||
if err := recover(); err != nil {
|
||||
t.Fatalf("%d. %q failed: %v", idx, string(entry.Data), err)
|
||||
}
|
||||
}()
|
||||
pkt.Decode()
|
||||
}()
|
||||
*/
|
||||
}
|
||||
}
|
206
Godeps/_workspace/src/github.com/akrennmair/gopcap/io.go
generated
vendored
206
Godeps/_workspace/src/github.com/akrennmair/gopcap/io.go
generated
vendored
@ -1,206 +0,0 @@
|
||||
package pcap
|
||||
|
||||
import (
|
||||
"encoding/binary"
|
||||
"fmt"
|
||||
"io"
|
||||
"time"
|
||||
)
|
||||
|
||||
// FileHeader is the parsed header of a pcap file.
|
||||
// http://wiki.wireshark.org/Development/LibpcapFileFormat
|
||||
type FileHeader struct {
|
||||
MagicNumber uint32
|
||||
VersionMajor uint16
|
||||
VersionMinor uint16
|
||||
TimeZone int32
|
||||
SigFigs uint32
|
||||
SnapLen uint32
|
||||
Network uint32
|
||||
}
|
||||
|
||||
type PacketTime struct {
|
||||
Sec int32
|
||||
Usec int32
|
||||
}
|
||||
|
||||
// Convert the PacketTime to a go Time struct.
|
||||
func (p *PacketTime) Time() time.Time {
|
||||
return time.Unix(int64(p.Sec), int64(p.Usec)*1000)
|
||||
}
|
||||
|
||||
// Packet is a single packet parsed from a pcap file.
|
||||
//
|
||||
// Convenient access to IP, TCP, and UDP headers is provided after Decode()
|
||||
// is called if the packet is of the appropriate type.
|
||||
type Packet struct {
|
||||
Time time.Time // packet send/receive time
|
||||
Caplen uint32 // bytes stored in the file (caplen <= len)
|
||||
Len uint32 // bytes sent/received
|
||||
Data []byte // packet data
|
||||
|
||||
Type int // protocol type, see LINKTYPE_*
|
||||
DestMac uint64
|
||||
SrcMac uint64
|
||||
|
||||
Headers []interface{} // decoded headers, in order
|
||||
Payload []byte // remaining non-header bytes
|
||||
|
||||
IP *Iphdr // IP header (for IP packets, after decoding)
|
||||
TCP *Tcphdr // TCP header (for TCP packets, after decoding)
|
||||
UDP *Udphdr // UDP header (for UDP packets after decoding)
|
||||
}
|
||||
|
||||
// Reader parses pcap files.
|
||||
type Reader struct {
|
||||
flip bool
|
||||
buf io.Reader
|
||||
err error
|
||||
fourBytes []byte
|
||||
twoBytes []byte
|
||||
sixteenBytes []byte
|
||||
Header FileHeader
|
||||
}
|
||||
|
||||
// NewReader reads pcap data from an io.Reader.
|
||||
func NewReader(reader io.Reader) (*Reader, error) {
|
||||
r := &Reader{
|
||||
buf: reader,
|
||||
fourBytes: make([]byte, 4),
|
||||
twoBytes: make([]byte, 2),
|
||||
sixteenBytes: make([]byte, 16),
|
||||
}
|
||||
switch magic := r.readUint32(); magic {
|
||||
case 0xa1b2c3d4:
|
||||
r.flip = false
|
||||
case 0xd4c3b2a1:
|
||||
r.flip = true
|
||||
default:
|
||||
return nil, fmt.Errorf("pcap: bad magic number: %0x", magic)
|
||||
}
|
||||
r.Header = FileHeader{
|
||||
MagicNumber: 0xa1b2c3d4,
|
||||
VersionMajor: r.readUint16(),
|
||||
VersionMinor: r.readUint16(),
|
||||
TimeZone: r.readInt32(),
|
||||
SigFigs: r.readUint32(),
|
||||
SnapLen: r.readUint32(),
|
||||
Network: r.readUint32(),
|
||||
}
|
||||
return r, nil
|
||||
}
|
||||
|
||||
// Next returns the next packet or nil if no more packets can be read.
|
||||
func (r *Reader) Next() *Packet {
|
||||
d := r.sixteenBytes
|
||||
r.err = r.read(d)
|
||||
if r.err != nil {
|
||||
return nil
|
||||
}
|
||||
timeSec := asUint32(d[0:4], r.flip)
|
||||
timeUsec := asUint32(d[4:8], r.flip)
|
||||
capLen := asUint32(d[8:12], r.flip)
|
||||
origLen := asUint32(d[12:16], r.flip)
|
||||
|
||||
data := make([]byte, capLen)
|
||||
if r.err = r.read(data); r.err != nil {
|
||||
return nil
|
||||
}
|
||||
return &Packet{
|
||||
Time: time.Unix(int64(timeSec), int64(timeUsec)),
|
||||
Caplen: capLen,
|
||||
Len: origLen,
|
||||
Data: data,
|
||||
}
|
||||
}
|
||||
|
||||
func (r *Reader) read(data []byte) error {
|
||||
var err error
|
||||
n, err := r.buf.Read(data)
|
||||
for err == nil && n != len(data) {
|
||||
var chunk int
|
||||
chunk, err = r.buf.Read(data[n:])
|
||||
n += chunk
|
||||
}
|
||||
if len(data) == n {
|
||||
return nil
|
||||
}
|
||||
return err
|
||||
}
|
||||
|
||||
func (r *Reader) readUint32() uint32 {
|
||||
data := r.fourBytes
|
||||
if r.err = r.read(data); r.err != nil {
|
||||
return 0
|
||||
}
|
||||
return asUint32(data, r.flip)
|
||||
}
|
||||
|
||||
func (r *Reader) readInt32() int32 {
|
||||
data := r.fourBytes
|
||||
if r.err = r.read(data); r.err != nil {
|
||||
return 0
|
||||
}
|
||||
return int32(asUint32(data, r.flip))
|
||||
}
|
||||
|
||||
func (r *Reader) readUint16() uint16 {
|
||||
data := r.twoBytes
|
||||
if r.err = r.read(data); r.err != nil {
|
||||
return 0
|
||||
}
|
||||
return asUint16(data, r.flip)
|
||||
}
|
||||
|
||||
// Writer writes a pcap file.
|
||||
type Writer struct {
|
||||
writer io.Writer
|
||||
buf []byte
|
||||
}
|
||||
|
||||
// NewWriter creates a Writer that stores output in an io.Writer.
|
||||
// The FileHeader is written immediately.
|
||||
func NewWriter(writer io.Writer, header *FileHeader) (*Writer, error) {
|
||||
w := &Writer{
|
||||
writer: writer,
|
||||
buf: make([]byte, 24),
|
||||
}
|
||||
binary.LittleEndian.PutUint32(w.buf, header.MagicNumber)
|
||||
binary.LittleEndian.PutUint16(w.buf[4:], header.VersionMajor)
|
||||
binary.LittleEndian.PutUint16(w.buf[6:], header.VersionMinor)
|
||||
binary.LittleEndian.PutUint32(w.buf[8:], uint32(header.TimeZone))
|
||||
binary.LittleEndian.PutUint32(w.buf[12:], header.SigFigs)
|
||||
binary.LittleEndian.PutUint32(w.buf[16:], header.SnapLen)
|
||||
binary.LittleEndian.PutUint32(w.buf[20:], header.Network)
|
||||
if _, err := writer.Write(w.buf); err != nil {
|
||||
return nil, err
|
||||
}
|
||||
return w, nil
|
||||
}
|
||||
|
||||
// Writer writes a packet to the underlying writer.
|
||||
func (w *Writer) Write(pkt *Packet) error {
|
||||
binary.LittleEndian.PutUint32(w.buf, uint32(pkt.Time.Unix()))
|
||||
binary.LittleEndian.PutUint32(w.buf[4:], uint32(pkt.Time.Nanosecond()))
|
||||
binary.LittleEndian.PutUint32(w.buf[8:], uint32(pkt.Time.Unix()))
|
||||
binary.LittleEndian.PutUint32(w.buf[12:], pkt.Len)
|
||||
if _, err := w.writer.Write(w.buf[:16]); err != nil {
|
||||
return err
|
||||
}
|
||||
_, err := w.writer.Write(pkt.Data)
|
||||
return err
|
||||
}
|
||||
|
||||
func asUint32(data []byte, flip bool) uint32 {
|
||||
if flip {
|
||||
return binary.BigEndian.Uint32(data)
|
||||
}
|
||||
return binary.LittleEndian.Uint32(data)
|
||||
}
|
||||
|
||||
func asUint16(data []byte, flip bool) uint16 {
|
||||
if flip {
|
||||
return binary.BigEndian.Uint16(data)
|
||||
}
|
||||
return binary.LittleEndian.Uint16(data)
|
||||
}
|
266
Godeps/_workspace/src/github.com/akrennmair/gopcap/pcap.go
generated
vendored
266
Godeps/_workspace/src/github.com/akrennmair/gopcap/pcap.go
generated
vendored
@ -1,266 +0,0 @@
|
||||
// Interface to both live and offline pcap parsing.
|
||||
package pcap
|
||||
|
||||
/*
|
||||
#cgo linux LDFLAGS: -lpcap
|
||||
#cgo freebsd LDFLAGS: -lpcap
|
||||
#cgo darwin LDFLAGS: -lpcap
|
||||
#cgo windows CFLAGS: -I C:/WpdPack/Include
|
||||
#cgo windows,386 LDFLAGS: -L C:/WpdPack/Lib -lwpcap
|
||||
#cgo windows,amd64 LDFLAGS: -L C:/WpdPack/Lib/x64 -lwpcap
|
||||
#include <stdlib.h>
|
||||
#include <pcap.h>
|
||||
|
||||
// Workaround for not knowing how to cast to const u_char**
|
||||
int hack_pcap_next_ex(pcap_t *p, struct pcap_pkthdr **pkt_header,
|
||||
u_char **pkt_data) {
|
||||
return pcap_next_ex(p, pkt_header, (const u_char **)pkt_data);
|
||||
}
|
||||
*/
|
||||
import "C"
|
||||
import (
|
||||
"errors"
|
||||
"net"
|
||||
"syscall"
|
||||
"time"
|
||||
"unsafe"
|
||||
)
|
||||
|
||||
type Pcap struct {
|
||||
cptr *C.pcap_t
|
||||
}
|
||||
|
||||
type Stat struct {
|
||||
PacketsReceived uint32
|
||||
PacketsDropped uint32
|
||||
PacketsIfDropped uint32
|
||||
}
|
||||
|
||||
type Interface struct {
|
||||
Name string
|
||||
Description string
|
||||
Addresses []IFAddress
|
||||
// TODO: add more elements
|
||||
}
|
||||
|
||||
type IFAddress struct {
|
||||
IP net.IP
|
||||
Netmask net.IPMask
|
||||
// TODO: add broadcast + PtP dst ?
|
||||
}
|
||||
|
||||
func (p *Pcap) Next() (pkt *Packet) {
|
||||
rv, _ := p.NextEx()
|
||||
return rv
|
||||
}
|
||||
|
||||
// Openlive opens a device and returns a *Pcap handler
|
||||
func Openlive(device string, snaplen int32, promisc bool, timeout_ms int32) (handle *Pcap, err error) {
|
||||
var buf *C.char
|
||||
buf = (*C.char)(C.calloc(ERRBUF_SIZE, 1))
|
||||
h := new(Pcap)
|
||||
var pro int32
|
||||
if promisc {
|
||||
pro = 1
|
||||
}
|
||||
|
||||
dev := C.CString(device)
|
||||
defer C.free(unsafe.Pointer(dev))
|
||||
|
||||
h.cptr = C.pcap_open_live(dev, C.int(snaplen), C.int(pro), C.int(timeout_ms), buf)
|
||||
if nil == h.cptr {
|
||||
handle = nil
|
||||
err = errors.New(C.GoString(buf))
|
||||
} else {
|
||||
handle = h
|
||||
}
|
||||
C.free(unsafe.Pointer(buf))
|
||||
return
|
||||
}
|
||||
|
||||
func Openoffline(file string) (handle *Pcap, err error) {
|
||||
var buf *C.char
|
||||
buf = (*C.char)(C.calloc(ERRBUF_SIZE, 1))
|
||||
h := new(Pcap)
|
||||
|
||||
cf := C.CString(file)
|
||||
defer C.free(unsafe.Pointer(cf))
|
||||
|
||||
h.cptr = C.pcap_open_offline(cf, buf)
|
||||
if nil == h.cptr {
|
||||
handle = nil
|
||||
err = errors.New(C.GoString(buf))
|
||||
} else {
|
||||
handle = h
|
||||
}
|
||||
C.free(unsafe.Pointer(buf))
|
||||
return
|
||||
}
|
||||
|
||||
func (p *Pcap) NextEx() (pkt *Packet, result int32) {
|
||||
var pkthdr *C.struct_pcap_pkthdr
|
||||
|
||||
var buf_ptr *C.u_char
|
||||
var buf unsafe.Pointer
|
||||
result = int32(C.hack_pcap_next_ex(p.cptr, &pkthdr, &buf_ptr))
|
||||
|
||||
buf = unsafe.Pointer(buf_ptr)
|
||||
if nil == buf {
|
||||
return
|
||||
}
|
||||
|
||||
pkt = new(Packet)
|
||||
pkt.Time = time.Unix(int64(pkthdr.ts.tv_sec), int64(pkthdr.ts.tv_usec)*1000)
|
||||
pkt.Caplen = uint32(pkthdr.caplen)
|
||||
pkt.Len = uint32(pkthdr.len)
|
||||
pkt.Data = C.GoBytes(buf, C.int(pkthdr.caplen))
|
||||
return
|
||||
}
|
||||
|
||||
func (p *Pcap) Close() {
|
||||
C.pcap_close(p.cptr)
|
||||
}
|
||||
|
||||
func (p *Pcap) Geterror() error {
|
||||
return errors.New(C.GoString(C.pcap_geterr(p.cptr)))
|
||||
}
|
||||
|
||||
func (p *Pcap) Getstats() (stat *Stat, err error) {
|
||||
var cstats _Ctype_struct_pcap_stat
|
||||
if -1 == C.pcap_stats(p.cptr, &cstats) {
|
||||
return nil, p.Geterror()
|
||||
}
|
||||
stats := new(Stat)
|
||||
stats.PacketsReceived = uint32(cstats.ps_recv)
|
||||
stats.PacketsDropped = uint32(cstats.ps_drop)
|
||||
stats.PacketsIfDropped = uint32(cstats.ps_ifdrop)
|
||||
|
||||
return stats, nil
|
||||
}
|
||||
|
||||
func (p *Pcap) Setfilter(expr string) (err error) {
|
||||
var bpf _Ctype_struct_bpf_program
|
||||
cexpr := C.CString(expr)
|
||||
defer C.free(unsafe.Pointer(cexpr))
|
||||
|
||||
if -1 == C.pcap_compile(p.cptr, &bpf, cexpr, 1, 0) {
|
||||
return p.Geterror()
|
||||
}
|
||||
|
||||
if -1 == C.pcap_setfilter(p.cptr, &bpf) {
|
||||
C.pcap_freecode(&bpf)
|
||||
return p.Geterror()
|
||||
}
|
||||
C.pcap_freecode(&bpf)
|
||||
return nil
|
||||
}
|
||||
|
||||
func Version() string {
|
||||
return C.GoString(C.pcap_lib_version())
|
||||
}
|
||||
|
||||
func (p *Pcap) Datalink() int {
|
||||
return int(C.pcap_datalink(p.cptr))
|
||||
}
|
||||
|
||||
func (p *Pcap) Setdatalink(dlt int) error {
|
||||
if -1 == C.pcap_set_datalink(p.cptr, C.int(dlt)) {
|
||||
return p.Geterror()
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
func DatalinkValueToName(dlt int) string {
|
||||
if name := C.pcap_datalink_val_to_name(C.int(dlt)); name != nil {
|
||||
return C.GoString(name)
|
||||
}
|
||||
return ""
|
||||
}
|
||||
|
||||
func DatalinkValueToDescription(dlt int) string {
|
||||
if desc := C.pcap_datalink_val_to_description(C.int(dlt)); desc != nil {
|
||||
return C.GoString(desc)
|
||||
}
|
||||
return ""
|
||||
}
|
||||
|
||||
func Findalldevs() (ifs []Interface, err error) {
|
||||
var buf *C.char
|
||||
buf = (*C.char)(C.calloc(ERRBUF_SIZE, 1))
|
||||
defer C.free(unsafe.Pointer(buf))
|
||||
var alldevsp *C.pcap_if_t
|
||||
|
||||
if -1 == C.pcap_findalldevs((**C.pcap_if_t)(&alldevsp), buf) {
|
||||
return nil, errors.New(C.GoString(buf))
|
||||
}
|
||||
defer C.pcap_freealldevs((*C.pcap_if_t)(alldevsp))
|
||||
dev := alldevsp
|
||||
var i uint32
|
||||
for i = 0; dev != nil; dev = (*C.pcap_if_t)(dev.next) {
|
||||
i++
|
||||
}
|
||||
ifs = make([]Interface, i)
|
||||
dev = alldevsp
|
||||
for j := uint32(0); dev != nil; dev = (*C.pcap_if_t)(dev.next) {
|
||||
var iface Interface
|
||||
iface.Name = C.GoString(dev.name)
|
||||
iface.Description = C.GoString(dev.description)
|
||||
iface.Addresses = findalladdresses(dev.addresses)
|
||||
// TODO: add more elements
|
||||
ifs[j] = iface
|
||||
j++
|
||||
}
|
||||
return
|
||||
}
|
||||
|
||||
func findalladdresses(addresses *_Ctype_struct_pcap_addr) (retval []IFAddress) {
|
||||
// TODO - make it support more than IPv4 and IPv6?
|
||||
retval = make([]IFAddress, 0, 1)
|
||||
for curaddr := addresses; curaddr != nil; curaddr = (*_Ctype_struct_pcap_addr)(curaddr.next) {
|
||||
var a IFAddress
|
||||
var err error
|
||||
if a.IP, err = sockaddr_to_IP((*syscall.RawSockaddr)(unsafe.Pointer(curaddr.addr))); err != nil {
|
||||
continue
|
||||
}
|
||||
if a.Netmask, err = sockaddr_to_IP((*syscall.RawSockaddr)(unsafe.Pointer(curaddr.addr))); err != nil {
|
||||
continue
|
||||
}
|
||||
retval = append(retval, a)
|
||||
}
|
||||
return
|
||||
}
|
||||
|
||||
func sockaddr_to_IP(rsa *syscall.RawSockaddr) (IP []byte, err error) {
|
||||
switch rsa.Family {
|
||||
case syscall.AF_INET:
|
||||
pp := (*syscall.RawSockaddrInet4)(unsafe.Pointer(rsa))
|
||||
IP = make([]byte, 4)
|
||||
for i := 0; i < len(IP); i++ {
|
||||
IP[i] = pp.Addr[i]
|
||||
}
|
||||
return
|
||||
case syscall.AF_INET6:
|
||||
pp := (*syscall.RawSockaddrInet6)(unsafe.Pointer(rsa))
|
||||
IP = make([]byte, 16)
|
||||
for i := 0; i < len(IP); i++ {
|
||||
IP[i] = pp.Addr[i]
|
||||
}
|
||||
return
|
||||
}
|
||||
err = errors.New("Unsupported address type")
|
||||
return
|
||||
}
|
||||
|
||||
func (p *Pcap) Inject(data []byte) (err error) {
|
||||
buf := (*C.char)(C.malloc((C.size_t)(len(data))))
|
||||
|
||||
for i := 0; i < len(data); i++ {
|
||||
*(*byte)(unsafe.Pointer(uintptr(unsafe.Pointer(buf)) + uintptr(i))) = data[i]
|
||||
}
|
||||
|
||||
if -1 == C.pcap_sendpacket(p.cptr, (*C.u_char)(unsafe.Pointer(buf)), (C.int)(len(data))) {
|
||||
err = p.Geterror()
|
||||
}
|
||||
C.free(unsafe.Pointer(buf))
|
||||
return
|
||||
}
|
49
Godeps/_workspace/src/github.com/akrennmair/gopcap/tools/benchmark/benchmark.go
generated
vendored
49
Godeps/_workspace/src/github.com/akrennmair/gopcap/tools/benchmark/benchmark.go
generated
vendored
@ -1,49 +0,0 @@
|
||||
package main
|
||||
|
||||
import (
|
||||
"flag"
|
||||
"fmt"
|
||||
"os"
|
||||
"runtime/pprof"
|
||||
"time"
|
||||
|
||||
"github.com/coreos/etcd/Godeps/_workspace/src/github.com/akrennmair/gopcap"
|
||||
)
|
||||
|
||||
func main() {
|
||||
var filename *string = flag.String("file", "", "filename")
|
||||
var decode *bool = flag.Bool("d", false, "If true, decode each packet")
|
||||
var cpuprofile *string = flag.String("cpuprofile", "", "filename")
|
||||
|
||||
flag.Parse()
|
||||
|
||||
h, err := pcap.Openoffline(*filename)
|
||||
if err != nil {
|
||||
fmt.Printf("Couldn't create pcap reader: %v", err)
|
||||
}
|
||||
|
||||
if *cpuprofile != "" {
|
||||
if out, err := os.Create(*cpuprofile); err == nil {
|
||||
pprof.StartCPUProfile(out)
|
||||
defer func() {
|
||||
pprof.StopCPUProfile()
|
||||
out.Close()
|
||||
}()
|
||||
} else {
|
||||
panic(err)
|
||||
}
|
||||
}
|
||||
|
||||
i, nilPackets := 0, 0
|
||||
start := time.Now()
|
||||
for pkt, code := h.NextEx(); code != -2; pkt, code = h.NextEx() {
|
||||
if pkt == nil {
|
||||
nilPackets++
|
||||
} else if *decode {
|
||||
pkt.Decode()
|
||||
}
|
||||
i++
|
||||
}
|
||||
duration := time.Since(start)
|
||||
fmt.Printf("Took %v to process %v packets, %v per packet, %d nil packets\n", duration, i, duration/time.Duration(i), nilPackets)
|
||||
}
|
96
Godeps/_workspace/src/github.com/akrennmair/gopcap/tools/pass/pass.go
generated
vendored
96
Godeps/_workspace/src/github.com/akrennmair/gopcap/tools/pass/pass.go
generated
vendored
@ -1,96 +0,0 @@
|
||||
package main
|
||||
|
||||
// Parses a pcap file, writes it back to disk, then verifies the files
|
||||
// are the same.
|
||||
import (
|
||||
"bufio"
|
||||
"flag"
|
||||
"fmt"
|
||||
"io"
|
||||
"os"
|
||||
|
||||
"github.com/coreos/etcd/Godeps/_workspace/src/github.com/akrennmair/gopcap"
|
||||
)
|
||||
|
||||
var input *string = flag.String("input", "", "input file")
|
||||
var output *string = flag.String("output", "", "output file")
|
||||
var decode *bool = flag.Bool("decode", false, "print decoded packets")
|
||||
|
||||
func copyPcap(dest, src string) {
|
||||
f, err := os.Open(src)
|
||||
if err != nil {
|
||||
fmt.Printf("couldn't open %q: %v\n", src, err)
|
||||
return
|
||||
}
|
||||
defer f.Close()
|
||||
reader, err := pcap.NewReader(bufio.NewReader(f))
|
||||
if err != nil {
|
||||
fmt.Printf("couldn't create reader: %v\n", err)
|
||||
return
|
||||
}
|
||||
w, err := os.Create(dest)
|
||||
if err != nil {
|
||||
fmt.Printf("couldn't open %q: %v\n", dest, err)
|
||||
return
|
||||
}
|
||||
defer w.Close()
|
||||
buf := bufio.NewWriter(w)
|
||||
writer, err := pcap.NewWriter(buf, &reader.Header)
|
||||
if err != nil {
|
||||
fmt.Printf("couldn't create writer: %v\n", err)
|
||||
return
|
||||
}
|
||||
for {
|
||||
pkt := reader.Next()
|
||||
if pkt == nil {
|
||||
break
|
||||
}
|
||||
if *decode {
|
||||
pkt.Decode()
|
||||
fmt.Println(pkt.String())
|
||||
}
|
||||
writer.Write(pkt)
|
||||
}
|
||||
buf.Flush()
|
||||
}
|
||||
|
||||
func check(dest, src string) {
|
||||
f, err := os.Open(src)
|
||||
if err != nil {
|
||||
fmt.Printf("couldn't open %q: %v\n", src, err)
|
||||
return
|
||||
}
|
||||
defer f.Close()
|
||||
freader := bufio.NewReader(f)
|
||||
|
||||
g, err := os.Open(dest)
|
||||
if err != nil {
|
||||
fmt.Printf("couldn't open %q: %v\n", src, err)
|
||||
return
|
||||
}
|
||||
defer g.Close()
|
||||
greader := bufio.NewReader(g)
|
||||
|
||||
for {
|
||||
fb, ferr := freader.ReadByte()
|
||||
gb, gerr := greader.ReadByte()
|
||||
|
||||
if ferr == io.EOF && gerr == io.EOF {
|
||||
break
|
||||
}
|
||||
if fb == gb {
|
||||
continue
|
||||
}
|
||||
fmt.Println("FAIL")
|
||||
return
|
||||
}
|
||||
|
||||
fmt.Println("PASS")
|
||||
}
|
||||
|
||||
func main() {
|
||||
flag.Parse()
|
||||
|
||||
copyPcap(*output, *input)
|
||||
check(*output, *input)
|
||||
}
|
82
Godeps/_workspace/src/github.com/akrennmair/gopcap/tools/pcaptest/pcaptest.go
generated
vendored
82
Godeps/_workspace/src/github.com/akrennmair/gopcap/tools/pcaptest/pcaptest.go
generated
vendored
@ -1,82 +0,0 @@
|
||||
package main
|
||||
|
||||
import (
|
||||
"flag"
|
||||
"fmt"
|
||||
"time"
|
||||
|
||||
"github.com/coreos/etcd/Godeps/_workspace/src/github.com/akrennmair/gopcap"
|
||||
)
|
||||
|
||||
func min(x uint32, y uint32) uint32 {
|
||||
if x < y {
|
||||
return x
|
||||
}
|
||||
return y
|
||||
}
|
||||
|
||||
func main() {
|
||||
var device *string = flag.String("d", "", "device")
|
||||
var file *string = flag.String("r", "", "file")
|
||||
var expr *string = flag.String("e", "", "filter expression")
|
||||
|
||||
flag.Parse()
|
||||
|
||||
var h *pcap.Pcap
|
||||
var err error
|
||||
|
||||
ifs, err := pcap.Findalldevs()
|
||||
if len(ifs) == 0 {
|
||||
fmt.Printf("Warning: no devices found : %s\n", err)
|
||||
} else {
|
||||
for i := 0; i < len(ifs); i++ {
|
||||
fmt.Printf("dev %d: %s (%s)\n", i+1, ifs[i].Name, ifs[i].Description)
|
||||
}
|
||||
}
|
||||
|
||||
if *device != "" {
|
||||
h, err = pcap.Openlive(*device, 65535, true, 0)
|
||||
if h == nil {
|
||||
fmt.Printf("Openlive(%s) failed: %s\n", *device, err)
|
||||
return
|
||||
}
|
||||
} else if *file != "" {
|
||||
h, err = pcap.Openoffline(*file)
|
||||
if h == nil {
|
||||
fmt.Printf("Openoffline(%s) failed: %s\n", *file, err)
|
||||
return
|
||||
}
|
||||
} else {
|
||||
fmt.Printf("usage: pcaptest [-d <device> | -r <file>]\n")
|
||||
return
|
||||
}
|
||||
defer h.Close()
|
||||
|
||||
fmt.Printf("pcap version: %s\n", pcap.Version())
|
||||
|
||||
if *expr != "" {
|
||||
fmt.Printf("Setting filter: %s\n", *expr)
|
||||
err := h.Setfilter(*expr)
|
||||
if err != nil {
|
||||
fmt.Printf("Warning: setting filter failed: %s\n", err)
|
||||
}
|
||||
}
|
||||
|
||||
for pkt := h.Next(); pkt != nil; pkt = h.Next() {
|
||||
fmt.Printf("time: %d.%06d (%s) caplen: %d len: %d\nData:",
|
||||
int64(pkt.Time.Second()), int64(pkt.Time.Nanosecond()),
|
||||
time.Unix(int64(pkt.Time.Second()), 0).String(), int64(pkt.Caplen), int64(pkt.Len))
|
||||
for i := uint32(0); i < pkt.Caplen; i++ {
|
||||
if i%32 == 0 {
|
||||
fmt.Printf("\n")
|
||||
}
|
||||
if 32 <= pkt.Data[i] && pkt.Data[i] <= 126 {
|
||||
fmt.Printf("%c", pkt.Data[i])
|
||||
} else {
|
||||
fmt.Printf(".")
|
||||
}
|
||||
}
|
||||
fmt.Printf("\n\n")
|
||||
}
|
||||
|
||||
}
|
121
Godeps/_workspace/src/github.com/akrennmair/gopcap/tools/tcpdump/tcpdump.go
generated
vendored
121
Godeps/_workspace/src/github.com/akrennmair/gopcap/tools/tcpdump/tcpdump.go
generated
vendored
@ -1,121 +0,0 @@
|
||||
package main
|
||||
|
||||
import (
|
||||
"bufio"
|
||||
"flag"
|
||||
"fmt"
|
||||
"os"
|
||||
|
||||
"github.com/coreos/etcd/Godeps/_workspace/src/github.com/akrennmair/gopcap"
|
||||
)
|
||||
|
||||
const (
|
||||
TYPE_IP = 0x0800
|
||||
TYPE_ARP = 0x0806
|
||||
TYPE_IP6 = 0x86DD
|
||||
|
||||
IP_ICMP = 1
|
||||
IP_INIP = 4
|
||||
IP_TCP = 6
|
||||
IP_UDP = 17
|
||||
)
|
||||
|
||||
var out *bufio.Writer
|
||||
var errout *bufio.Writer
|
||||
|
||||
func main() {
|
||||
var device *string = flag.String("i", "", "interface")
|
||||
var snaplen *int = flag.Int("s", 65535, "snaplen")
|
||||
var hexdump *bool = flag.Bool("X", false, "hexdump")
|
||||
expr := ""
|
||||
|
||||
out = bufio.NewWriter(os.Stdout)
|
||||
errout = bufio.NewWriter(os.Stderr)
|
||||
|
||||
flag.Usage = func() {
|
||||
fmt.Fprintf(errout, "usage: %s [ -i interface ] [ -s snaplen ] [ -X ] [ expression ]\n", os.Args[0])
|
||||
errout.Flush()
|
||||
os.Exit(1)
|
||||
}
|
||||
|
||||
flag.Parse()
|
||||
|
||||
if len(flag.Args()) > 0 {
|
||||
expr = flag.Arg(0)
|
||||
}
|
||||
|
||||
if *device == "" {
|
||||
devs, err := pcap.Findalldevs()
|
||||
if err != nil {
|
||||
fmt.Fprintf(errout, "tcpdump: couldn't find any devices: %s\n", err)
|
||||
}
|
||||
if 0 == len(devs) {
|
||||
flag.Usage()
|
||||
}
|
||||
*device = devs[0].Name
|
||||
}
|
||||
|
||||
h, err := pcap.Openlive(*device, int32(*snaplen), true, 0)
|
||||
if h == nil {
|
||||
fmt.Fprintf(errout, "tcpdump: %s\n", err)
|
||||
errout.Flush()
|
||||
return
|
||||
}
|
||||
defer h.Close()
|
||||
|
||||
if expr != "" {
|
||||
ferr := h.Setfilter(expr)
|
||||
if ferr != nil {
|
||||
fmt.Fprintf(out, "tcpdump: %s\n", ferr)
|
||||
out.Flush()
|
||||
}
|
||||
}
|
||||
|
||||
for pkt := h.Next(); pkt != nil; pkt = h.Next() {
|
||||
pkt.Decode()
|
||||
fmt.Fprintf(out, "%s\n", pkt.String())
|
||||
if *hexdump {
|
||||
Hexdump(pkt)
|
||||
}
|
||||
out.Flush()
|
||||
}
|
||||
}
|
||||
|
||||
func min(a, b int) int {
|
||||
if a < b {
|
||||
return a
|
||||
}
|
||||
return b
|
||||
}
|
||||
|
||||
func Hexdump(pkt *pcap.Packet) {
|
||||
for i := 0; i < len(pkt.Data); i += 16 {
|
||||
Dumpline(uint32(i), pkt.Data[i:min(i+16, len(pkt.Data))])
|
||||
}
|
||||
}
|
||||
|
||||
func Dumpline(addr uint32, line []byte) {
|
||||
fmt.Fprintf(out, "\t0x%04x: ", int32(addr))
|
||||
var i uint16
|
||||
for i = 0; i < 16 && i < uint16(len(line)); i++ {
|
||||
if i%2 == 0 {
|
||||
out.WriteString(" ")
|
||||
}
|
||||
fmt.Fprintf(out, "%02x", line[i])
|
||||
}
|
||||
for j := i; j <= 16; j++ {
|
||||
if j%2 == 0 {
|
||||
out.WriteString(" ")
|
||||
}
|
||||
out.WriteString(" ")
|
||||
}
|
||||
out.WriteString(" ")
|
||||
for i = 0; i < 16 && i < uint16(len(line)); i++ {
|
||||
if line[i] >= 32 && line[i] <= 126 {
|
||||
fmt.Fprintf(out, "%c", line[i])
|
||||
} else {
|
||||
out.WriteString(".")
|
||||
}
|
||||
}
|
||||
out.WriteString("\n")
|
||||
}
|
50
Godeps/_workspace/src/github.com/boltdb/bolt/Makefile
generated
vendored
50
Godeps/_workspace/src/github.com/boltdb/bolt/Makefile
generated
vendored
@ -1,18 +1,54 @@
|
||||
TEST=.
|
||||
BENCH=.
|
||||
COVERPROFILE=/tmp/c.out
|
||||
BRANCH=`git rev-parse --abbrev-ref HEAD`
|
||||
COMMIT=`git rev-parse --short HEAD`
|
||||
GOLDFLAGS="-X main.branch $(BRANCH) -X main.commit $(COMMIT)"
|
||||
|
||||
default: build
|
||||
|
||||
race:
|
||||
@go test -v -race -test.run="TestSimulate_(100op|1000op)"
|
||||
bench:
|
||||
go test -v -test.run=NOTHINCONTAINSTHIS -test.bench=$(BENCH)
|
||||
|
||||
# http://cloc.sourceforge.net/
|
||||
cloc:
|
||||
@cloc --not-match-f='Makefile|_test.go' .
|
||||
|
||||
cover: fmt
|
||||
go test -coverprofile=$(COVERPROFILE) -test.run=$(TEST) $(COVERFLAG) .
|
||||
go tool cover -html=$(COVERPROFILE)
|
||||
rm $(COVERPROFILE)
|
||||
|
||||
cpuprofile: fmt
|
||||
@go test -c
|
||||
@./bolt.test -test.v -test.run=$(TEST) -test.cpuprofile cpu.prof
|
||||
|
||||
# go get github.com/kisielk/errcheck
|
||||
errcheck:
|
||||
@errcheck -ignorepkg=bytes -ignore=os:Remove github.com/boltdb/bolt
|
||||
@echo "=== errcheck ==="
|
||||
@errcheck github.com/boltdb/bolt
|
||||
|
||||
test:
|
||||
@go test -v -cover .
|
||||
@go test -v ./cmd/bolt
|
||||
fmt:
|
||||
@go fmt ./...
|
||||
|
||||
.PHONY: fmt test
|
||||
get:
|
||||
@go get -d ./...
|
||||
|
||||
build: get
|
||||
@mkdir -p bin
|
||||
@go build -ldflags=$(GOLDFLAGS) -a -o bin/bolt ./cmd/bolt
|
||||
|
||||
test: fmt
|
||||
@go get github.com/stretchr/testify/assert
|
||||
@echo "=== TESTS ==="
|
||||
@go test -v -cover -test.run=$(TEST)
|
||||
@echo ""
|
||||
@echo ""
|
||||
@echo "=== CLI ==="
|
||||
@go test -v -test.run=$(TEST) ./cmd/bolt
|
||||
@echo ""
|
||||
@echo ""
|
||||
@echo "=== RACE DETECTOR ==="
|
||||
@go test -v -race -test.run="TestSimulate_(100op|1000op)"
|
||||
|
||||
.PHONY: bench cloc cover cpuprofile fmt memprofile test
|
||||
|
272
Godeps/_workspace/src/github.com/boltdb/bolt/README.md
generated
vendored
272
Godeps/_workspace/src/github.com/boltdb/bolt/README.md
generated
vendored
@ -1,8 +1,8 @@
|
||||
Bolt [](https://drone.io/github.com/boltdb/bolt/latest) [](https://coveralls.io/r/boltdb/bolt?branch=master) [](https://godoc.org/github.com/boltdb/bolt) 
|
||||
Bolt [](https://drone.io/github.com/boltdb/bolt/latest) [](https://coveralls.io/r/boltdb/bolt?branch=master) [](https://godoc.org/github.com/boltdb/bolt) 
|
||||
====
|
||||
|
||||
Bolt is a pure Go key/value store inspired by [Howard Chu's][hyc_symas]
|
||||
[LMDB project][lmdb]. The goal of the project is to provide a simple,
|
||||
Bolt is a pure Go key/value store inspired by [Howard Chu's][hyc_symas] and
|
||||
the [LMDB project][lmdb]. The goal of the project is to provide a simple,
|
||||
fast, and reliable database for projects that don't require a full database
|
||||
server such as Postgres or MySQL.
|
||||
|
||||
@ -13,6 +13,7 @@ and setting values. That's it.
|
||||
[hyc_symas]: https://twitter.com/hyc_symas
|
||||
[lmdb]: http://symas.com/mdb/
|
||||
|
||||
|
||||
## Project Status
|
||||
|
||||
Bolt is stable and the API is fixed. Full unit test coverage and randomized
|
||||
@ -21,36 +22,6 @@ Bolt is currently in high-load production environments serving databases as
|
||||
large as 1TB. Many companies such as Shopify and Heroku use Bolt-backed
|
||||
services every day.
|
||||
|
||||
## Table of Contents
|
||||
|
||||
- [Getting Started](#getting-started)
|
||||
- [Installing](#installing)
|
||||
- [Opening a database](#opening-a-database)
|
||||
- [Transactions](#transactions)
|
||||
- [Read-write transactions](#read-write-transactions)
|
||||
- [Read-only transactions](#read-only-transactions)
|
||||
- [Batch read-write transactions](#batch-read-write-transactions)
|
||||
- [Managing transactions manually](#managing-transactions-manually)
|
||||
- [Using buckets](#using-buckets)
|
||||
- [Using key/value pairs](#using-keyvalue-pairs)
|
||||
- [Autoincrementing integer for the bucket](#autoincrementing-integer-for-the-bucket)
|
||||
- [Iterating over keys](#iterating-over-keys)
|
||||
- [Prefix scans](#prefix-scans)
|
||||
- [Range scans](#range-scans)
|
||||
- [ForEach()](#foreach)
|
||||
- [Nested buckets](#nested-buckets)
|
||||
- [Database backups](#database-backups)
|
||||
- [Statistics](#statistics)
|
||||
- [Read-Only Mode](#read-only-mode)
|
||||
- [Mobile Use (iOS/Android)](#mobile-use-iosandroid)
|
||||
- [Resources](#resources)
|
||||
- [Comparison with other databases](#comparison-with-other-databases)
|
||||
- [Postgres, MySQL, & other relational databases](#postgres-mysql--other-relational-databases)
|
||||
- [LevelDB, RocksDB](#leveldb-rocksdb)
|
||||
- [LMDB](#lmdb)
|
||||
- [Caveats & Limitations](#caveats--limitations)
|
||||
- [Reading the Source](#reading-the-source)
|
||||
- [Other Projects Using Bolt](#other-projects-using-bolt)
|
||||
|
||||
## Getting Started
|
||||
|
||||
@ -209,8 +180,8 @@ and then safely close your transaction if an error is returned. This is the
|
||||
recommended way to use Bolt transactions.
|
||||
|
||||
However, sometimes you may want to manually start and end your transactions.
|
||||
You can use the `Tx.Begin()` function directly but **please** be sure to close
|
||||
the transaction.
|
||||
You can use the `Tx.Begin()` function directly but _please_ be sure to close the
|
||||
transaction.
|
||||
|
||||
```go
|
||||
// Start a writable transaction.
|
||||
@ -285,7 +256,7 @@ db.View(func(tx *bolt.Tx) error {
|
||||
```
|
||||
|
||||
The `Get()` function does not return an error because its operation is
|
||||
guaranteed to work (unless there is some kind of system failure). If the key
|
||||
guarenteed to work (unless there is some kind of system failure). If the key
|
||||
exists then it will return its byte slice value. If it doesn't exist then it
|
||||
will return `nil`. It's important to note that you can have a zero-length value
|
||||
set to a key which is different than the key not existing.
|
||||
@ -297,49 +268,6 @@ transaction is open. If you need to use a value outside of the transaction
|
||||
then you must use `copy()` to copy it to another byte slice.
|
||||
|
||||
|
||||
### Autoincrementing integer for the bucket
|
||||
By using the `NextSequence()` function, you can let Bolt determine a sequence
|
||||
which can be used as the unique identifier for your key/value pairs. See the
|
||||
example below.
|
||||
|
||||
```go
|
||||
// CreateUser saves u to the store. The new user ID is set on u once the data is persisted.
|
||||
func (s *Store) CreateUser(u *User) error {
|
||||
return s.db.Update(func(tx *bolt.Tx) error {
|
||||
// Retrieve the users bucket.
|
||||
// This should be created when the DB is first opened.
|
||||
b := tx.Bucket([]byte("users"))
|
||||
|
||||
// Generate ID for the user.
|
||||
// This returns an error only if the Tx is closed or not writeable.
|
||||
// That can't happen in an Update() call so I ignore the error check.
|
||||
id, _ = b.NextSequence()
|
||||
u.ID = int(id)
|
||||
|
||||
// Marshal user data into bytes.
|
||||
buf, err := json.Marshal(u)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
// Persist bytes to users bucket.
|
||||
return b.Put(itob(u.ID), buf)
|
||||
})
|
||||
}
|
||||
|
||||
// itob returns an 8-byte big endian representation of v.
|
||||
func itob(v int) []byte {
|
||||
b := make([]byte, 8)
|
||||
binary.BigEndian.PutUint64(b, uint64(v))
|
||||
return b
|
||||
}
|
||||
|
||||
type User struct {
|
||||
ID int
|
||||
...
|
||||
}
|
||||
```
|
||||
|
||||
### Iterating over keys
|
||||
|
||||
Bolt stores its keys in byte-sorted order within a bucket. This makes sequential
|
||||
@ -348,9 +276,7 @@ iteration over these keys extremely fast. To iterate over keys we'll use a
|
||||
|
||||
```go
|
||||
db.View(func(tx *bolt.Tx) error {
|
||||
// Assume bucket exists and has keys
|
||||
b := tx.Bucket([]byte("MyBucket"))
|
||||
|
||||
c := b.Cursor()
|
||||
|
||||
for k, v := c.First(); k != nil; k, v = c.Next() {
|
||||
@ -374,15 +300,10 @@ Next() Move to the next key.
|
||||
Prev() Move to the previous key.
|
||||
```
|
||||
|
||||
Each of those functions has a return signature of `(key []byte, value []byte)`.
|
||||
When you have iterated to the end of the cursor then `Next()` will return a
|
||||
`nil` key. You must seek to a position using `First()`, `Last()`, or `Seek()`
|
||||
before calling `Next()` or `Prev()`. If you do not seek to a position then
|
||||
these functions will return a `nil` key.
|
||||
|
||||
During iteration, if the key is non-`nil` but the value is `nil`, that means
|
||||
the key refers to a bucket rather than a value. Use `Bucket.Bucket()` to
|
||||
access the sub-bucket.
|
||||
When you have iterated to the end of the cursor then `Next()` will return `nil`.
|
||||
You must seek to a position using `First()`, `Last()`, or `Seek()` before
|
||||
calling `Next()` or `Prev()`. If you do not seek to a position then these
|
||||
functions will return `nil`.
|
||||
|
||||
|
||||
#### Prefix scans
|
||||
@ -391,7 +312,6 @@ To iterate over a key prefix, you can combine `Seek()` and `bytes.HasPrefix()`:
|
||||
|
||||
```go
|
||||
db.View(func(tx *bolt.Tx) error {
|
||||
// Assume bucket exists and has keys
|
||||
c := tx.Bucket([]byte("MyBucket")).Cursor()
|
||||
|
||||
prefix := []byte("1234")
|
||||
@ -411,7 +331,7 @@ date range like this:
|
||||
|
||||
```go
|
||||
db.View(func(tx *bolt.Tx) error {
|
||||
// Assume our events bucket exists and has RFC3339 encoded time keys.
|
||||
// Assume our events bucket has RFC3339 encoded time keys.
|
||||
c := tx.Bucket([]byte("Events")).Cursor()
|
||||
|
||||
// Our time range spans the 90's decade.
|
||||
@ -435,9 +355,7 @@ all the keys in a bucket:
|
||||
|
||||
```go
|
||||
db.View(func(tx *bolt.Tx) error {
|
||||
// Assume bucket exists and has keys
|
||||
b := tx.Bucket([]byte("MyBucket"))
|
||||
|
||||
b.ForEach(func(k, v []byte) error {
|
||||
fmt.Printf("key=%s, value=%s\n", k, v)
|
||||
return nil
|
||||
@ -464,11 +382,8 @@ func (*Bucket) DeleteBucket(key []byte) error
|
||||
Bolt is a single file so it's easy to backup. You can use the `Tx.WriteTo()`
|
||||
function to write a consistent view of the database to a writer. If you call
|
||||
this from a read-only transaction, it will perform a hot backup and not block
|
||||
your other database reads and writes.
|
||||
|
||||
By default, it will use a regular file handle which will utilize the operating
|
||||
system's page cache. See the [`Tx`](https://godoc.org/github.com/boltdb/bolt#Tx)
|
||||
documentation for information about optimizing for larger-than-RAM datasets.
|
||||
your other database reads and writes. It will also use `O_DIRECT` when available
|
||||
to prevent page cache trashing.
|
||||
|
||||
One common use case is to backup over HTTP so you can use tools like `cURL` to
|
||||
do database backups:
|
||||
@ -550,84 +465,6 @@ if err != nil {
|
||||
}
|
||||
```
|
||||
|
||||
### Mobile Use (iOS/Android)
|
||||
|
||||
Bolt is able to run on mobile devices by leveraging the binding feature of the
|
||||
[gomobile](https://github.com/golang/mobile) tool. Create a struct that will
|
||||
contain your database logic and a reference to a `*bolt.DB` with a initializing
|
||||
contstructor that takes in a filepath where the database file will be stored.
|
||||
Neither Android nor iOS require extra permissions or cleanup from using this method.
|
||||
|
||||
```go
|
||||
func NewBoltDB(filepath string) *BoltDB {
|
||||
db, err := bolt.Open(filepath+"/demo.db", 0600, nil)
|
||||
if err != nil {
|
||||
log.Fatal(err)
|
||||
}
|
||||
|
||||
return &BoltDB{db}
|
||||
}
|
||||
|
||||
type BoltDB struct {
|
||||
db *bolt.DB
|
||||
...
|
||||
}
|
||||
|
||||
func (b *BoltDB) Path() string {
|
||||
return b.db.Path()
|
||||
}
|
||||
|
||||
func (b *BoltDB) Close() {
|
||||
b.db.Close()
|
||||
}
|
||||
```
|
||||
|
||||
Database logic should be defined as methods on this wrapper struct.
|
||||
|
||||
To initialize this struct from the native language (both platforms now sync
|
||||
their local storage to the cloud. These snippets disable that functionality for the
|
||||
database file):
|
||||
|
||||
#### Android
|
||||
|
||||
```java
|
||||
String path;
|
||||
if (android.os.Build.VERSION.SDK_INT >=android.os.Build.VERSION_CODES.LOLLIPOP){
|
||||
path = getNoBackupFilesDir().getAbsolutePath();
|
||||
} else{
|
||||
path = getFilesDir().getAbsolutePath();
|
||||
}
|
||||
Boltmobiledemo.BoltDB boltDB = Boltmobiledemo.NewBoltDB(path)
|
||||
```
|
||||
|
||||
#### iOS
|
||||
|
||||
```objc
|
||||
- (void)demo {
|
||||
NSString* path = [NSSearchPathForDirectoriesInDomains(NSLibraryDirectory,
|
||||
NSUserDomainMask,
|
||||
YES) objectAtIndex:0];
|
||||
GoBoltmobiledemoBoltDB * demo = GoBoltmobiledemoNewBoltDB(path);
|
||||
[self addSkipBackupAttributeToItemAtPath:demo.path];
|
||||
//Some DB Logic would go here
|
||||
[demo close];
|
||||
}
|
||||
|
||||
- (BOOL)addSkipBackupAttributeToItemAtPath:(NSString *) filePathString
|
||||
{
|
||||
NSURL* URL= [NSURL fileURLWithPath: filePathString];
|
||||
assert([[NSFileManager defaultManager] fileExistsAtPath: [URL path]]);
|
||||
|
||||
NSError *error = nil;
|
||||
BOOL success = [URL setResourceValue: [NSNumber numberWithBool: YES]
|
||||
forKey: NSURLIsExcludedFromBackupKey error: &error];
|
||||
if(!success){
|
||||
NSLog(@"Error excluding %@ from backup %@", [URL lastPathComponent], error);
|
||||
}
|
||||
return success;
|
||||
}
|
||||
|
||||
```
|
||||
|
||||
## Resources
|
||||
|
||||
@ -663,7 +500,7 @@ they are libraries bundled into the application, however, their underlying
|
||||
structure is a log-structured merge-tree (LSM tree). An LSM tree optimizes
|
||||
random writes by using a write ahead log and multi-tiered, sorted files called
|
||||
SSTables. Bolt uses a B+tree internally and only a single file. Both approaches
|
||||
have trade-offs.
|
||||
have trade offs.
|
||||
|
||||
If you require a high random write throughput (>10,000 w/sec) or you need to use
|
||||
spinning disks then LevelDB could be a good choice. If your application is
|
||||
@ -699,8 +536,9 @@ It's important to pick the right tool for the job and Bolt is no exception.
|
||||
Here are a few things to note when evaluating and using Bolt:
|
||||
|
||||
* Bolt is good for read intensive workloads. Sequential write performance is
|
||||
also fast but random writes can be slow. You can use `DB.Batch()` or add a
|
||||
write-ahead log to help mitigate this issue.
|
||||
also fast but random writes can be slow. You can add a write-ahead log or
|
||||
[transaction coalescer](https://github.com/boltdb/coalescer) in front of Bolt
|
||||
to mitigate this issue.
|
||||
|
||||
* Bolt uses a B+tree internally so there can be a lot of random page access.
|
||||
SSDs provide a significant performance boost over spinning disks.
|
||||
@ -730,13 +568,11 @@ Here are a few things to note when evaluating and using Bolt:
|
||||
can in memory and will release memory as needed to other processes. This means
|
||||
that Bolt can show very high memory usage when working with large databases.
|
||||
However, this is expected and the OS will release memory as needed. Bolt can
|
||||
handle databases much larger than the available physical RAM, provided its
|
||||
memory-map fits in the process virtual address space. It may be problematic
|
||||
on 32-bits systems.
|
||||
handle databases much larger than the available physical RAM.
|
||||
|
||||
* The data structures in the Bolt database are memory mapped so the data file
|
||||
will be endian specific. This means that you cannot copy a Bolt file from a
|
||||
little endian machine to a big endian machine and have it work. For most
|
||||
little endian machine to a big endian machine and have it work. For most
|
||||
users this is not a concern since most modern CPUs are little endian.
|
||||
|
||||
* Because of the way pages are laid out on disk, Bolt cannot truncate data files
|
||||
@ -751,56 +587,6 @@ Here are a few things to note when evaluating and using Bolt:
|
||||
[page-allocation]: https://github.com/boltdb/bolt/issues/308#issuecomment-74811638
|
||||
|
||||
|
||||
## Reading the Source
|
||||
|
||||
Bolt is a relatively small code base (<3KLOC) for an embedded, serializable,
|
||||
transactional key/value database so it can be a good starting point for people
|
||||
interested in how databases work.
|
||||
|
||||
The best places to start are the main entry points into Bolt:
|
||||
|
||||
- `Open()` - Initializes the reference to the database. It's responsible for
|
||||
creating the database if it doesn't exist, obtaining an exclusive lock on the
|
||||
file, reading the meta pages, & memory-mapping the file.
|
||||
|
||||
- `DB.Begin()` - Starts a read-only or read-write transaction depending on the
|
||||
value of the `writable` argument. This requires briefly obtaining the "meta"
|
||||
lock to keep track of open transactions. Only one read-write transaction can
|
||||
exist at a time so the "rwlock" is acquired during the life of a read-write
|
||||
transaction.
|
||||
|
||||
- `Bucket.Put()` - Writes a key/value pair into a bucket. After validating the
|
||||
arguments, a cursor is used to traverse the B+tree to the page and position
|
||||
where they key & value will be written. Once the position is found, the bucket
|
||||
materializes the underlying page and the page's parent pages into memory as
|
||||
"nodes". These nodes are where mutations occur during read-write transactions.
|
||||
These changes get flushed to disk during commit.
|
||||
|
||||
- `Bucket.Get()` - Retrieves a key/value pair from a bucket. This uses a cursor
|
||||
to move to the page & position of a key/value pair. During a read-only
|
||||
transaction, the key and value data is returned as a direct reference to the
|
||||
underlying mmap file so there's no allocation overhead. For read-write
|
||||
transactions, this data may reference the mmap file or one of the in-memory
|
||||
node values.
|
||||
|
||||
- `Cursor` - This object is simply for traversing the B+tree of on-disk pages
|
||||
or in-memory nodes. It can seek to a specific key, move to the first or last
|
||||
value, or it can move forward or backward. The cursor handles the movement up
|
||||
and down the B+tree transparently to the end user.
|
||||
|
||||
- `Tx.Commit()` - Converts the in-memory dirty nodes and the list of free pages
|
||||
into pages to be written to disk. Writing to disk then occurs in two phases.
|
||||
First, the dirty pages are written to disk and an `fsync()` occurs. Second, a
|
||||
new meta page with an incremented transaction ID is written and another
|
||||
`fsync()` occurs. This two phase write ensures that partially written data
|
||||
pages are ignored in the event of a crash since the meta page pointing to them
|
||||
is never written. Partially written meta pages are invalidated because they
|
||||
are written with a checksum.
|
||||
|
||||
If you have additional notes that could be helpful for others, please submit
|
||||
them via pull request.
|
||||
|
||||
|
||||
## Other Projects Using Bolt
|
||||
|
||||
Below is a list of public, open source projects that use Bolt:
|
||||
@ -811,33 +597,25 @@ Below is a list of public, open source projects that use Bolt:
|
||||
* [Skybox Analytics](https://github.com/skybox/skybox) - A standalone funnel analysis tool for web analytics.
|
||||
* [Scuttlebutt](https://github.com/benbjohnson/scuttlebutt) - Uses Bolt to store and process all Twitter mentions of GitHub projects.
|
||||
* [Wiki](https://github.com/peterhellberg/wiki) - A tiny wiki using Goji, BoltDB and Blackfriday.
|
||||
* [ChainStore](https://github.com/pressly/chainstore) - Simple key-value interface to a variety of storage engines organized as a chain of operations.
|
||||
* [ChainStore](https://github.com/nulayer/chainstore) - Simple key-value interface to a variety of storage engines organized as a chain of operations.
|
||||
* [MetricBase](https://github.com/msiebuhr/MetricBase) - Single-binary version of Graphite.
|
||||
* [Gitchain](https://github.com/gitchain/gitchain) - Decentralized, peer-to-peer Git repositories aka "Git meets Bitcoin".
|
||||
* [event-shuttle](https://github.com/sclasen/event-shuttle) - A Unix system service to collect and reliably deliver messages to Kafka.
|
||||
* [ipxed](https://github.com/kelseyhightower/ipxed) - Web interface and api for ipxed.
|
||||
* [BoltStore](https://github.com/yosssi/boltstore) - Session store using Bolt.
|
||||
* [photosite/session](https://godoc.org/bitbucket.org/kardianos/photosite/session) - Sessions for a photo viewing site.
|
||||
* [photosite/session](http://godoc.org/bitbucket.org/kardianos/photosite/session) - Sessions for a photo viewing site.
|
||||
* [LedisDB](https://github.com/siddontang/ledisdb) - A high performance NoSQL, using Bolt as optional storage.
|
||||
* [ipLocator](https://github.com/AndreasBriese/ipLocator) - A fast ip-geo-location-server using bolt with bloom filters.
|
||||
* [cayley](https://github.com/google/cayley) - Cayley is an open-source graph database using Bolt as optional backend.
|
||||
* [bleve](http://www.blevesearch.com/) - A pure Go search engine similar to ElasticSearch that uses Bolt as the default storage backend.
|
||||
* [tentacool](https://github.com/optiflows/tentacool) - REST api server to manage system stuff (IP, DNS, Gateway...) on a linux server.
|
||||
* [SkyDB](https://github.com/skydb/sky) - Behavioral analytics database.
|
||||
* [Seaweed File System](https://github.com/chrislusf/seaweedfs) - Highly scalable distributed key~file system with O(1) disk read.
|
||||
* [InfluxDB](https://influxdata.com) - Scalable datastore for metrics, events, and real-time analytics.
|
||||
* [Seaweed File System](https://github.com/chrislusf/weed-fs) - Highly scalable distributed key~file system with O(1) disk read.
|
||||
* [InfluxDB](http://influxdb.com) - Scalable datastore for metrics, events, and real-time analytics.
|
||||
* [Freehold](http://tshannon.bitbucket.org/freehold/) - An open, secure, and lightweight platform for your files and data.
|
||||
* [Prometheus Annotation Server](https://github.com/oliver006/prom_annotation_server) - Annotation server for PromDash & Prometheus service monitoring system.
|
||||
* [Consul](https://github.com/hashicorp/consul) - Consul is service discovery and configuration made easy. Distributed, highly available, and datacenter-aware.
|
||||
* [Kala](https://github.com/ajvb/kala) - Kala is a modern job scheduler optimized to run on a single node. It is persistent, JSON over HTTP API, ISO 8601 duration notation, and dependent jobs.
|
||||
* [Kala](https://github.com/ajvb/kala) - Kala is a modern job scheduler optimized to run on a single node. It is persistant, JSON over HTTP API, ISO 8601 duration notation, and dependent jobs.
|
||||
* [drive](https://github.com/odeke-em/drive) - drive is an unofficial Google Drive command line client for \*NIX operating systems.
|
||||
* [stow](https://github.com/djherbis/stow) - a persistence manager for objects
|
||||
backed by boltdb.
|
||||
* [buckets](https://github.com/joyrexus/buckets) - a bolt wrapper streamlining
|
||||
simple tx and key scans.
|
||||
* [mbuckets](https://github.com/abhigupta912/mbuckets) - A Bolt wrapper that allows easy operations on multi level (nested) buckets.
|
||||
* [Request Baskets](https://github.com/darklynx/request-baskets) - A web service to collect arbitrary HTTP requests and inspect them via REST API or simple web UI, similar to [RequestBin](http://requestb.in/) service
|
||||
* [Go Report Card](https://goreportcard.com/) - Go code quality report cards as a (free and open source) service.
|
||||
* [Boltdb Boilerplate](https://github.com/bobintornado/boltdb-boilerplate) - Boilerplate wrapper around bolt aiming to make simple calls one-liners.
|
||||
|
||||
If you are using Bolt in a project please send a pull request to add it to the list.
|
||||
|
138
Godeps/_workspace/src/github.com/boltdb/bolt/batch.go
generated
vendored
Normal file
138
Godeps/_workspace/src/github.com/boltdb/bolt/batch.go
generated
vendored
Normal file
@ -0,0 +1,138 @@
|
||||
package bolt
|
||||
|
||||
import (
|
||||
"errors"
|
||||
"fmt"
|
||||
"sync"
|
||||
"time"
|
||||
)
|
||||
|
||||
// Batch calls fn as part of a batch. It behaves similar to Update,
|
||||
// except:
|
||||
//
|
||||
// 1. concurrent Batch calls can be combined into a single Bolt
|
||||
// transaction.
|
||||
//
|
||||
// 2. the function passed to Batch may be called multiple times,
|
||||
// regardless of whether it returns error or not.
|
||||
//
|
||||
// This means that Batch function side effects must be idempotent and
|
||||
// take permanent effect only after a successful return is seen in
|
||||
// caller.
|
||||
//
|
||||
// The maximum batch size and delay can be adjusted with DB.MaxBatchSize
|
||||
// and DB.MaxBatchDelay, respectively.
|
||||
//
|
||||
// Batch is only useful when there are multiple goroutines calling it.
|
||||
func (db *DB) Batch(fn func(*Tx) error) error {
|
||||
errCh := make(chan error, 1)
|
||||
|
||||
db.batchMu.Lock()
|
||||
if (db.batch == nil) || (db.batch != nil && len(db.batch.calls) >= db.MaxBatchSize) {
|
||||
// There is no existing batch, or the existing batch is full; start a new one.
|
||||
db.batch = &batch{
|
||||
db: db,
|
||||
}
|
||||
db.batch.timer = time.AfterFunc(db.MaxBatchDelay, db.batch.trigger)
|
||||
}
|
||||
db.batch.calls = append(db.batch.calls, call{fn: fn, err: errCh})
|
||||
if len(db.batch.calls) >= db.MaxBatchSize {
|
||||
// wake up batch, it's ready to run
|
||||
go db.batch.trigger()
|
||||
}
|
||||
db.batchMu.Unlock()
|
||||
|
||||
err := <-errCh
|
||||
if err == trySolo {
|
||||
err = db.Update(fn)
|
||||
}
|
||||
return err
|
||||
}
|
||||
|
||||
type call struct {
|
||||
fn func(*Tx) error
|
||||
err chan<- error
|
||||
}
|
||||
|
||||
type batch struct {
|
||||
db *DB
|
||||
timer *time.Timer
|
||||
start sync.Once
|
||||
calls []call
|
||||
}
|
||||
|
||||
// trigger runs the batch if it hasn't already been run.
|
||||
func (b *batch) trigger() {
|
||||
b.start.Do(b.run)
|
||||
}
|
||||
|
||||
// run performs the transactions in the batch and communicates results
|
||||
// back to DB.Batch.
|
||||
func (b *batch) run() {
|
||||
b.db.batchMu.Lock()
|
||||
b.timer.Stop()
|
||||
// Make sure no new work is added to this batch, but don't break
|
||||
// other batches.
|
||||
if b.db.batch == b {
|
||||
b.db.batch = nil
|
||||
}
|
||||
b.db.batchMu.Unlock()
|
||||
|
||||
retry:
|
||||
for len(b.calls) > 0 {
|
||||
var failIdx = -1
|
||||
err := b.db.Update(func(tx *Tx) error {
|
||||
for i, c := range b.calls {
|
||||
if err := safelyCall(c.fn, tx); err != nil {
|
||||
failIdx = i
|
||||
return err
|
||||
}
|
||||
}
|
||||
return nil
|
||||
})
|
||||
|
||||
if failIdx >= 0 {
|
||||
// take the failing transaction out of the batch. it's
|
||||
// safe to shorten b.calls here because db.batch no longer
|
||||
// points to us, and we hold the mutex anyway.
|
||||
c := b.calls[failIdx]
|
||||
b.calls[failIdx], b.calls = b.calls[len(b.calls)-1], b.calls[:len(b.calls)-1]
|
||||
// tell the submitter re-run it solo, continue with the rest of the batch
|
||||
c.err <- trySolo
|
||||
continue retry
|
||||
}
|
||||
|
||||
// pass success, or bolt internal errors, to all callers
|
||||
for _, c := range b.calls {
|
||||
if c.err != nil {
|
||||
c.err <- err
|
||||
}
|
||||
}
|
||||
break retry
|
||||
}
|
||||
}
|
||||
|
||||
// trySolo is a special sentinel error value used for signaling that a
|
||||
// transaction function should be re-run. It should never be seen by
|
||||
// callers.
|
||||
var trySolo = errors.New("batch function returned an error and should be re-run solo")
|
||||
|
||||
type panicked struct {
|
||||
reason interface{}
|
||||
}
|
||||
|
||||
func (p panicked) Error() string {
|
||||
if err, ok := p.reason.(error); ok {
|
||||
return err.Error()
|
||||
}
|
||||
return fmt.Sprintf("panic: %v", p.reason)
|
||||
}
|
||||
|
||||
func safelyCall(fn func(*Tx) error, tx *Tx) (err error) {
|
||||
defer func() {
|
||||
if p := recover(); p != nil {
|
||||
err = panicked{p}
|
||||
}
|
||||
}()
|
||||
return fn(tx)
|
||||
}
|
170
Godeps/_workspace/src/github.com/boltdb/bolt/batch_benchmark_test.go
generated
vendored
Normal file
170
Godeps/_workspace/src/github.com/boltdb/bolt/batch_benchmark_test.go
generated
vendored
Normal file
@ -0,0 +1,170 @@
|
||||
package bolt_test
|
||||
|
||||
import (
|
||||
"bytes"
|
||||
"encoding/binary"
|
||||
"errors"
|
||||
"hash/fnv"
|
||||
"sync"
|
||||
"testing"
|
||||
|
||||
"github.com/coreos/etcd/Godeps/_workspace/src/github.com/boltdb/bolt"
|
||||
)
|
||||
|
||||
func validateBatchBench(b *testing.B, db *TestDB) {
|
||||
var rollback = errors.New("sentinel error to cause rollback")
|
||||
validate := func(tx *bolt.Tx) error {
|
||||
bucket := tx.Bucket([]byte("bench"))
|
||||
h := fnv.New32a()
|
||||
buf := make([]byte, 4)
|
||||
for id := uint32(0); id < 1000; id++ {
|
||||
binary.LittleEndian.PutUint32(buf, id)
|
||||
h.Reset()
|
||||
h.Write(buf[:])
|
||||
k := h.Sum(nil)
|
||||
v := bucket.Get(k)
|
||||
if v == nil {
|
||||
b.Errorf("not found id=%d key=%x", id, k)
|
||||
continue
|
||||
}
|
||||
if g, e := v, []byte("filler"); !bytes.Equal(g, e) {
|
||||
b.Errorf("bad value for id=%d key=%x: %s != %q", id, k, g, e)
|
||||
}
|
||||
if err := bucket.Delete(k); err != nil {
|
||||
return err
|
||||
}
|
||||
}
|
||||
// should be empty now
|
||||
c := bucket.Cursor()
|
||||
for k, v := c.First(); k != nil; k, v = c.Next() {
|
||||
b.Errorf("unexpected key: %x = %q", k, v)
|
||||
}
|
||||
return rollback
|
||||
}
|
||||
if err := db.Update(validate); err != nil && err != rollback {
|
||||
b.Error(err)
|
||||
}
|
||||
}
|
||||
|
||||
func BenchmarkDBBatchAutomatic(b *testing.B) {
|
||||
db := NewTestDB()
|
||||
defer db.Close()
|
||||
db.MustCreateBucket([]byte("bench"))
|
||||
|
||||
b.ResetTimer()
|
||||
for i := 0; i < b.N; i++ {
|
||||
start := make(chan struct{})
|
||||
var wg sync.WaitGroup
|
||||
|
||||
for round := 0; round < 1000; round++ {
|
||||
wg.Add(1)
|
||||
|
||||
go func(id uint32) {
|
||||
defer wg.Done()
|
||||
<-start
|
||||
|
||||
h := fnv.New32a()
|
||||
buf := make([]byte, 4)
|
||||
binary.LittleEndian.PutUint32(buf, id)
|
||||
h.Write(buf[:])
|
||||
k := h.Sum(nil)
|
||||
insert := func(tx *bolt.Tx) error {
|
||||
b := tx.Bucket([]byte("bench"))
|
||||
return b.Put(k, []byte("filler"))
|
||||
}
|
||||
if err := db.Batch(insert); err != nil {
|
||||
b.Error(err)
|
||||
return
|
||||
}
|
||||
}(uint32(round))
|
||||
}
|
||||
close(start)
|
||||
wg.Wait()
|
||||
}
|
||||
|
||||
b.StopTimer()
|
||||
validateBatchBench(b, db)
|
||||
}
|
||||
|
||||
func BenchmarkDBBatchSingle(b *testing.B) {
|
||||
db := NewTestDB()
|
||||
defer db.Close()
|
||||
db.MustCreateBucket([]byte("bench"))
|
||||
|
||||
b.ResetTimer()
|
||||
for i := 0; i < b.N; i++ {
|
||||
start := make(chan struct{})
|
||||
var wg sync.WaitGroup
|
||||
|
||||
for round := 0; round < 1000; round++ {
|
||||
wg.Add(1)
|
||||
go func(id uint32) {
|
||||
defer wg.Done()
|
||||
<-start
|
||||
|
||||
h := fnv.New32a()
|
||||
buf := make([]byte, 4)
|
||||
binary.LittleEndian.PutUint32(buf, id)
|
||||
h.Write(buf[:])
|
||||
k := h.Sum(nil)
|
||||
insert := func(tx *bolt.Tx) error {
|
||||
b := tx.Bucket([]byte("bench"))
|
||||
return b.Put(k, []byte("filler"))
|
||||
}
|
||||
if err := db.Update(insert); err != nil {
|
||||
b.Error(err)
|
||||
return
|
||||
}
|
||||
}(uint32(round))
|
||||
}
|
||||
close(start)
|
||||
wg.Wait()
|
||||
}
|
||||
|
||||
b.StopTimer()
|
||||
validateBatchBench(b, db)
|
||||
}
|
||||
|
||||
func BenchmarkDBBatchManual10x100(b *testing.B) {
|
||||
db := NewTestDB()
|
||||
defer db.Close()
|
||||
db.MustCreateBucket([]byte("bench"))
|
||||
|
||||
b.ResetTimer()
|
||||
for i := 0; i < b.N; i++ {
|
||||
start := make(chan struct{})
|
||||
var wg sync.WaitGroup
|
||||
|
||||
for major := 0; major < 10; major++ {
|
||||
wg.Add(1)
|
||||
go func(id uint32) {
|
||||
defer wg.Done()
|
||||
<-start
|
||||
|
||||
insert100 := func(tx *bolt.Tx) error {
|
||||
h := fnv.New32a()
|
||||
buf := make([]byte, 4)
|
||||
for minor := uint32(0); minor < 100; minor++ {
|
||||
binary.LittleEndian.PutUint32(buf, uint32(id*100+minor))
|
||||
h.Reset()
|
||||
h.Write(buf[:])
|
||||
k := h.Sum(nil)
|
||||
b := tx.Bucket([]byte("bench"))
|
||||
if err := b.Put(k, []byte("filler")); err != nil {
|
||||
return err
|
||||
}
|
||||
}
|
||||
return nil
|
||||
}
|
||||
if err := db.Update(insert100); err != nil {
|
||||
b.Fatal(err)
|
||||
}
|
||||
}(uint32(major))
|
||||
}
|
||||
close(start)
|
||||
wg.Wait()
|
||||
}
|
||||
|
||||
b.StopTimer()
|
||||
validateBatchBench(b, db)
|
||||
}
|
148
Godeps/_workspace/src/github.com/boltdb/bolt/batch_example_test.go
generated
vendored
Normal file
148
Godeps/_workspace/src/github.com/boltdb/bolt/batch_example_test.go
generated
vendored
Normal file
@ -0,0 +1,148 @@
|
||||
package bolt_test
|
||||
|
||||
import (
|
||||
"encoding/binary"
|
||||
"fmt"
|
||||
"io/ioutil"
|
||||
"log"
|
||||
"math/rand"
|
||||
"net/http"
|
||||
"net/http/httptest"
|
||||
"os"
|
||||
|
||||
"github.com/coreos/etcd/Godeps/_workspace/src/github.com/boltdb/bolt"
|
||||
)
|
||||
|
||||
// Set this to see how the counts are actually updated.
|
||||
const verbose = false
|
||||
|
||||
// Counter updates a counter in Bolt for every URL path requested.
|
||||
type counter struct {
|
||||
db *bolt.DB
|
||||
}
|
||||
|
||||
func (c counter) ServeHTTP(rw http.ResponseWriter, req *http.Request) {
|
||||
// Communicates the new count from a successful database
|
||||
// transaction.
|
||||
var result uint64
|
||||
|
||||
increment := func(tx *bolt.Tx) error {
|
||||
b, err := tx.CreateBucketIfNotExists([]byte("hits"))
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
key := []byte(req.URL.String())
|
||||
// Decode handles key not found for us.
|
||||
count := decode(b.Get(key)) + 1
|
||||
b.Put(key, encode(count))
|
||||
// All good, communicate new count.
|
||||
result = count
|
||||
return nil
|
||||
}
|
||||
if err := c.db.Batch(increment); err != nil {
|
||||
http.Error(rw, err.Error(), 500)
|
||||
return
|
||||
}
|
||||
|
||||
if verbose {
|
||||
log.Printf("server: %s: %d", req.URL.String(), result)
|
||||
}
|
||||
|
||||
rw.Header().Set("Content-Type", "application/octet-stream")
|
||||
fmt.Fprintf(rw, "%d\n", result)
|
||||
}
|
||||
|
||||
func client(id int, base string, paths []string) error {
|
||||
// Process paths in random order.
|
||||
rng := rand.New(rand.NewSource(int64(id)))
|
||||
permutation := rng.Perm(len(paths))
|
||||
|
||||
for i := range paths {
|
||||
path := paths[permutation[i]]
|
||||
resp, err := http.Get(base + path)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
defer resp.Body.Close()
|
||||
buf, err := ioutil.ReadAll(resp.Body)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
if verbose {
|
||||
log.Printf("client: %s: %s", path, buf)
|
||||
}
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
func ExampleDB_Batch() {
|
||||
// Open the database.
|
||||
db, _ := bolt.Open(tempfile(), 0666, nil)
|
||||
defer os.Remove(db.Path())
|
||||
defer db.Close()
|
||||
|
||||
// Start our web server
|
||||
count := counter{db}
|
||||
srv := httptest.NewServer(count)
|
||||
defer srv.Close()
|
||||
|
||||
// Decrease the batch size to make things more interesting.
|
||||
db.MaxBatchSize = 3
|
||||
|
||||
// Get every path multiple times concurrently.
|
||||
const clients = 10
|
||||
paths := []string{
|
||||
"/foo",
|
||||
"/bar",
|
||||
"/baz",
|
||||
"/quux",
|
||||
"/thud",
|
||||
"/xyzzy",
|
||||
}
|
||||
errors := make(chan error, clients)
|
||||
for i := 0; i < clients; i++ {
|
||||
go func(id int) {
|
||||
errors <- client(id, srv.URL, paths)
|
||||
}(i)
|
||||
}
|
||||
// Check all responses to make sure there's no error.
|
||||
for i := 0; i < clients; i++ {
|
||||
if err := <-errors; err != nil {
|
||||
fmt.Printf("client error: %v", err)
|
||||
return
|
||||
}
|
||||
}
|
||||
|
||||
// Check the final result
|
||||
db.View(func(tx *bolt.Tx) error {
|
||||
b := tx.Bucket([]byte("hits"))
|
||||
c := b.Cursor()
|
||||
for k, v := c.First(); k != nil; k, v = c.Next() {
|
||||
fmt.Printf("hits to %s: %d\n", k, decode(v))
|
||||
}
|
||||
return nil
|
||||
})
|
||||
|
||||
// Output:
|
||||
// hits to /bar: 10
|
||||
// hits to /baz: 10
|
||||
// hits to /foo: 10
|
||||
// hits to /quux: 10
|
||||
// hits to /thud: 10
|
||||
// hits to /xyzzy: 10
|
||||
}
|
||||
|
||||
// encode marshals a counter.
|
||||
func encode(n uint64) []byte {
|
||||
buf := make([]byte, 8)
|
||||
binary.BigEndian.PutUint64(buf, n)
|
||||
return buf
|
||||
}
|
||||
|
||||
// decode unmarshals a counter. Nil buffers are decoded as 0.
|
||||
func decode(buf []byte) uint64 {
|
||||
if buf == nil {
|
||||
return 0
|
||||
}
|
||||
return binary.BigEndian.Uint64(buf)
|
||||
}
|
167
Godeps/_workspace/src/github.com/boltdb/bolt/batch_test.go
generated
vendored
Normal file
167
Godeps/_workspace/src/github.com/boltdb/bolt/batch_test.go
generated
vendored
Normal file
@ -0,0 +1,167 @@
|
||||
package bolt_test
|
||||
|
||||
import (
|
||||
"testing"
|
||||
"time"
|
||||
|
||||
"github.com/coreos/etcd/Godeps/_workspace/src/github.com/boltdb/bolt"
|
||||
)
|
||||
|
||||
// Ensure two functions can perform updates in a single batch.
|
||||
func TestDB_Batch(t *testing.T) {
|
||||
db := NewTestDB()
|
||||
defer db.Close()
|
||||
db.MustCreateBucket([]byte("widgets"))
|
||||
|
||||
// Iterate over multiple updates in separate goroutines.
|
||||
n := 2
|
||||
ch := make(chan error)
|
||||
for i := 0; i < n; i++ {
|
||||
go func(i int) {
|
||||
ch <- db.Batch(func(tx *bolt.Tx) error {
|
||||
return tx.Bucket([]byte("widgets")).Put(u64tob(uint64(i)), []byte{})
|
||||
})
|
||||
}(i)
|
||||
}
|
||||
|
||||
// Check all responses to make sure there's no error.
|
||||
for i := 0; i < n; i++ {
|
||||
if err := <-ch; err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
}
|
||||
|
||||
// Ensure data is correct.
|
||||
db.MustView(func(tx *bolt.Tx) error {
|
||||
b := tx.Bucket([]byte("widgets"))
|
||||
for i := 0; i < n; i++ {
|
||||
if v := b.Get(u64tob(uint64(i))); v == nil {
|
||||
t.Errorf("key not found: %d", i)
|
||||
}
|
||||
}
|
||||
return nil
|
||||
})
|
||||
}
|
||||
|
||||
func TestDB_Batch_Panic(t *testing.T) {
|
||||
db := NewTestDB()
|
||||
defer db.Close()
|
||||
|
||||
var sentinel int
|
||||
var bork = &sentinel
|
||||
var problem interface{}
|
||||
var err error
|
||||
|
||||
// Execute a function inside a batch that panics.
|
||||
func() {
|
||||
defer func() {
|
||||
if p := recover(); p != nil {
|
||||
problem = p
|
||||
}
|
||||
}()
|
||||
err = db.Batch(func(tx *bolt.Tx) error {
|
||||
panic(bork)
|
||||
})
|
||||
}()
|
||||
|
||||
// Verify there is no error.
|
||||
if g, e := err, error(nil); g != e {
|
||||
t.Fatalf("wrong error: %v != %v", g, e)
|
||||
}
|
||||
// Verify the panic was captured.
|
||||
if g, e := problem, bork; g != e {
|
||||
t.Fatalf("wrong error: %v != %v", g, e)
|
||||
}
|
||||
}
|
||||
|
||||
func TestDB_BatchFull(t *testing.T) {
|
||||
db := NewTestDB()
|
||||
defer db.Close()
|
||||
db.MustCreateBucket([]byte("widgets"))
|
||||
|
||||
const size = 3
|
||||
// buffered so we never leak goroutines
|
||||
ch := make(chan error, size)
|
||||
put := func(i int) {
|
||||
ch <- db.Batch(func(tx *bolt.Tx) error {
|
||||
return tx.Bucket([]byte("widgets")).Put(u64tob(uint64(i)), []byte{})
|
||||
})
|
||||
}
|
||||
|
||||
db.MaxBatchSize = size
|
||||
// high enough to never trigger here
|
||||
db.MaxBatchDelay = 1 * time.Hour
|
||||
|
||||
go put(1)
|
||||
go put(2)
|
||||
|
||||
// Give the batch a chance to exhibit bugs.
|
||||
time.Sleep(10 * time.Millisecond)
|
||||
|
||||
// not triggered yet
|
||||
select {
|
||||
case <-ch:
|
||||
t.Fatalf("batch triggered too early")
|
||||
default:
|
||||
}
|
||||
|
||||
go put(3)
|
||||
|
||||
// Check all responses to make sure there's no error.
|
||||
for i := 0; i < size; i++ {
|
||||
if err := <-ch; err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
}
|
||||
|
||||
// Ensure data is correct.
|
||||
db.MustView(func(tx *bolt.Tx) error {
|
||||
b := tx.Bucket([]byte("widgets"))
|
||||
for i := 1; i <= size; i++ {
|
||||
if v := b.Get(u64tob(uint64(i))); v == nil {
|
||||
t.Errorf("key not found: %d", i)
|
||||
}
|
||||
}
|
||||
return nil
|
||||
})
|
||||
}
|
||||
|
||||
func TestDB_BatchTime(t *testing.T) {
|
||||
db := NewTestDB()
|
||||
defer db.Close()
|
||||
db.MustCreateBucket([]byte("widgets"))
|
||||
|
||||
const size = 1
|
||||
// buffered so we never leak goroutines
|
||||
ch := make(chan error, size)
|
||||
put := func(i int) {
|
||||
ch <- db.Batch(func(tx *bolt.Tx) error {
|
||||
return tx.Bucket([]byte("widgets")).Put(u64tob(uint64(i)), []byte{})
|
||||
})
|
||||
}
|
||||
|
||||
db.MaxBatchSize = 1000
|
||||
db.MaxBatchDelay = 0
|
||||
|
||||
go put(1)
|
||||
|
||||
// Batch must trigger by time alone.
|
||||
|
||||
// Check all responses to make sure there's no error.
|
||||
for i := 0; i < size; i++ {
|
||||
if err := <-ch; err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
}
|
||||
|
||||
// Ensure data is correct.
|
||||
db.MustView(func(tx *bolt.Tx) error {
|
||||
b := tx.Bucket([]byte("widgets"))
|
||||
for i := 1; i <= size; i++ {
|
||||
if v := b.Get(u64tob(uint64(i))); v == nil {
|
||||
t.Errorf("key not found: %d", i)
|
||||
}
|
||||
}
|
||||
return nil
|
||||
})
|
||||
}
|
9
Godeps/_workspace/src/github.com/boltdb/bolt/bolt_arm64.go
generated
vendored
9
Godeps/_workspace/src/github.com/boltdb/bolt/bolt_arm64.go
generated
vendored
@ -1,9 +0,0 @@
|
||||
// +build arm64
|
||||
|
||||
package bolt
|
||||
|
||||
// maxMapSize represents the largest mmap size supported by Bolt.
|
||||
const maxMapSize = 0xFFFFFFFFFFFF // 256TB
|
||||
|
||||
// maxAllocSize is the size used when creating array pointers.
|
||||
const maxAllocSize = 0x7FFFFFFF
|
2
Godeps/_workspace/src/github.com/boltdb/bolt/bolt_linux.go
generated
vendored
2
Godeps/_workspace/src/github.com/boltdb/bolt/bolt_linux.go
generated
vendored
@ -4,6 +4,8 @@ import (
|
||||
"syscall"
|
||||
)
|
||||
|
||||
var odirect = syscall.O_DIRECT
|
||||
|
||||
// fdatasync flushes written data to a file descriptor.
|
||||
func fdatasync(db *DB) error {
|
||||
return syscall.Fdatasync(int(db.file.Fd()))
|
||||
|
2
Godeps/_workspace/src/github.com/boltdb/bolt/bolt_openbsd.go
generated
vendored
2
Godeps/_workspace/src/github.com/boltdb/bolt/bolt_openbsd.go
generated
vendored
@ -11,6 +11,8 @@ const (
|
||||
msInvalidate // invalidate cached data
|
||||
)
|
||||
|
||||
var odirect int
|
||||
|
||||
func msync(db *DB) error {
|
||||
_, _, errno := syscall.Syscall(syscall.SYS_MSYNC, uintptr(unsafe.Pointer(db.data)), uintptr(db.datasz), msInvalidate)
|
||||
if errno != 0 {
|
||||
|
9
Godeps/_workspace/src/github.com/boltdb/bolt/bolt_ppc64le.go
generated
vendored
9
Godeps/_workspace/src/github.com/boltdb/bolt/bolt_ppc64le.go
generated
vendored
@ -1,9 +0,0 @@
|
||||
// +build ppc64le
|
||||
|
||||
package bolt
|
||||
|
||||
// maxMapSize represents the largest mmap size supported by Bolt.
|
||||
const maxMapSize = 0xFFFFFFFFFFFF // 256TB
|
||||
|
||||
// maxAllocSize is the size used when creating array pointers.
|
||||
const maxAllocSize = 0x7FFFFFFF
|
9
Godeps/_workspace/src/github.com/boltdb/bolt/bolt_s390x.go
generated
vendored
9
Godeps/_workspace/src/github.com/boltdb/bolt/bolt_s390x.go
generated
vendored
@ -1,9 +0,0 @@
|
||||
// +build s390x
|
||||
|
||||
package bolt
|
||||
|
||||
// maxMapSize represents the largest mmap size supported by Bolt.
|
||||
const maxMapSize = 0xFFFFFFFFFFFF // 256TB
|
||||
|
||||
// maxAllocSize is the size used when creating array pointers.
|
||||
const maxAllocSize = 0x7FFFFFFF
|
36
Godeps/_workspace/src/github.com/boltdb/bolt/bolt_test.go
generated
vendored
Normal file
36
Godeps/_workspace/src/github.com/boltdb/bolt/bolt_test.go
generated
vendored
Normal file
@ -0,0 +1,36 @@
|
||||
package bolt_test
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"path/filepath"
|
||||
"reflect"
|
||||
"runtime"
|
||||
"testing"
|
||||
)
|
||||
|
||||
// assert fails the test if the condition is false.
|
||||
func assert(tb testing.TB, condition bool, msg string, v ...interface{}) {
|
||||
if !condition {
|
||||
_, file, line, _ := runtime.Caller(1)
|
||||
fmt.Printf("\033[31m%s:%d: "+msg+"\033[39m\n\n", append([]interface{}{filepath.Base(file), line}, v...)...)
|
||||
tb.FailNow()
|
||||
}
|
||||
}
|
||||
|
||||
// ok fails the test if an err is not nil.
|
||||
func ok(tb testing.TB, err error) {
|
||||
if err != nil {
|
||||
_, file, line, _ := runtime.Caller(1)
|
||||
fmt.Printf("\033[31m%s:%d: unexpected error: %s\033[39m\n\n", filepath.Base(file), line, err.Error())
|
||||
tb.FailNow()
|
||||
}
|
||||
}
|
||||
|
||||
// equals fails the test if exp is not equal to act.
|
||||
func equals(tb testing.TB, exp, act interface{}) {
|
||||
if !reflect.DeepEqual(exp, act) {
|
||||
_, file, line, _ := runtime.Caller(1)
|
||||
fmt.Printf("\033[31m%s:%d:\n\n\texp: %#v\n\n\tgot: %#v\033[39m\n\n", filepath.Base(file), line, exp, act)
|
||||
tb.FailNow()
|
||||
}
|
||||
}
|
21
Godeps/_workspace/src/github.com/boltdb/bolt/bolt_unix.go
generated
vendored
21
Godeps/_workspace/src/github.com/boltdb/bolt/bolt_unix.go
generated
vendored
@ -11,7 +11,7 @@ import (
|
||||
)
|
||||
|
||||
// flock acquires an advisory lock on a file descriptor.
|
||||
func flock(db *DB, mode os.FileMode, exclusive bool, timeout time.Duration) error {
|
||||
func flock(f *os.File, exclusive bool, timeout time.Duration) error {
|
||||
var t time.Time
|
||||
for {
|
||||
// If we're beyond our timeout then return an error.
|
||||
@ -27,7 +27,7 @@ func flock(db *DB, mode os.FileMode, exclusive bool, timeout time.Duration) erro
|
||||
}
|
||||
|
||||
// Otherwise attempt to obtain an exclusive lock.
|
||||
err := syscall.Flock(int(db.file.Fd()), flag|syscall.LOCK_NB)
|
||||
err := syscall.Flock(int(f.Fd()), flag|syscall.LOCK_NB)
|
||||
if err == nil {
|
||||
return nil
|
||||
} else if err != syscall.EWOULDBLOCK {
|
||||
@ -40,14 +40,25 @@ func flock(db *DB, mode os.FileMode, exclusive bool, timeout time.Duration) erro
|
||||
}
|
||||
|
||||
// funlock releases an advisory lock on a file descriptor.
|
||||
func funlock(db *DB) error {
|
||||
return syscall.Flock(int(db.file.Fd()), syscall.LOCK_UN)
|
||||
func funlock(f *os.File) error {
|
||||
return syscall.Flock(int(f.Fd()), syscall.LOCK_UN)
|
||||
}
|
||||
|
||||
// mmap memory maps a DB's data file.
|
||||
func mmap(db *DB, sz int) error {
|
||||
// Truncate and fsync to ensure file size metadata is flushed.
|
||||
// https://github.com/boltdb/bolt/issues/284
|
||||
if !db.NoGrowSync && !db.readOnly {
|
||||
if err := db.file.Truncate(int64(sz)); err != nil {
|
||||
return fmt.Errorf("file resize error: %s", err)
|
||||
}
|
||||
if err := db.file.Sync(); err != nil {
|
||||
return fmt.Errorf("file sync error: %s", err)
|
||||
}
|
||||
}
|
||||
|
||||
// Map the data file to memory.
|
||||
b, err := syscall.Mmap(int(db.file.Fd()), 0, sz, syscall.PROT_READ, syscall.MAP_SHARED|db.MmapFlags)
|
||||
b, err := syscall.Mmap(int(db.file.Fd()), 0, sz, syscall.PROT_READ, syscall.MAP_SHARED)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
|
24
Godeps/_workspace/src/github.com/boltdb/bolt/bolt_unix_solaris.go
generated
vendored
24
Godeps/_workspace/src/github.com/boltdb/bolt/bolt_unix_solaris.go
generated
vendored
@ -2,16 +2,15 @@ package bolt
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"github.com/coreos/etcd/Godeps/_workspace/src/golang.org/x/sys/unix"
|
||||
"os"
|
||||
"syscall"
|
||||
"time"
|
||||
"unsafe"
|
||||
|
||||
"github.com/coreos/etcd/Godeps/_workspace/src/golang.org/x/sys/unix"
|
||||
)
|
||||
|
||||
// flock acquires an advisory lock on a file descriptor.
|
||||
func flock(db *DB, mode os.FileMode, exclusive bool, timeout time.Duration) error {
|
||||
func flock(f *os.File, exclusive bool, timeout time.Duration) error {
|
||||
var t time.Time
|
||||
for {
|
||||
// If we're beyond our timeout then return an error.
|
||||
@ -32,7 +31,7 @@ func flock(db *DB, mode os.FileMode, exclusive bool, timeout time.Duration) erro
|
||||
} else {
|
||||
lock.Type = syscall.F_RDLCK
|
||||
}
|
||||
err := syscall.FcntlFlock(db.file.Fd(), syscall.F_SETLK, &lock)
|
||||
err := syscall.FcntlFlock(f.Fd(), syscall.F_SETLK, &lock)
|
||||
if err == nil {
|
||||
return nil
|
||||
} else if err != syscall.EAGAIN {
|
||||
@ -45,19 +44,30 @@ func flock(db *DB, mode os.FileMode, exclusive bool, timeout time.Duration) erro
|
||||
}
|
||||
|
||||
// funlock releases an advisory lock on a file descriptor.
|
||||
func funlock(db *DB) error {
|
||||
func funlock(f *os.File) error {
|
||||
var lock syscall.Flock_t
|
||||
lock.Start = 0
|
||||
lock.Len = 0
|
||||
lock.Type = syscall.F_UNLCK
|
||||
lock.Whence = 0
|
||||
return syscall.FcntlFlock(uintptr(db.file.Fd()), syscall.F_SETLK, &lock)
|
||||
return syscall.FcntlFlock(uintptr(f.Fd()), syscall.F_SETLK, &lock)
|
||||
}
|
||||
|
||||
// mmap memory maps a DB's data file.
|
||||
func mmap(db *DB, sz int) error {
|
||||
// Truncate and fsync to ensure file size metadata is flushed.
|
||||
// https://github.com/boltdb/bolt/issues/284
|
||||
if !db.NoGrowSync && !db.readOnly {
|
||||
if err := db.file.Truncate(int64(sz)); err != nil {
|
||||
return fmt.Errorf("file resize error: %s", err)
|
||||
}
|
||||
if err := db.file.Sync(); err != nil {
|
||||
return fmt.Errorf("file sync error: %s", err)
|
||||
}
|
||||
}
|
||||
|
||||
// Map the data file to memory.
|
||||
b, err := unix.Mmap(int(db.file.Fd()), 0, sz, syscall.PROT_READ, syscall.MAP_SHARED|db.MmapFlags)
|
||||
b, err := unix.Mmap(int(db.file.Fd()), 0, sz, syscall.PROT_READ, syscall.MAP_SHARED)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
|
78
Godeps/_workspace/src/github.com/boltdb/bolt/bolt_windows.go
generated
vendored
78
Godeps/_workspace/src/github.com/boltdb/bolt/bolt_windows.go
generated
vendored
@ -8,39 +8,7 @@ import (
|
||||
"unsafe"
|
||||
)
|
||||
|
||||
// LockFileEx code derived from golang build filemutex_windows.go @ v1.5.1
|
||||
var (
|
||||
modkernel32 = syscall.NewLazyDLL("kernel32.dll")
|
||||
procLockFileEx = modkernel32.NewProc("LockFileEx")
|
||||
procUnlockFileEx = modkernel32.NewProc("UnlockFileEx")
|
||||
)
|
||||
|
||||
const (
|
||||
lockExt = ".lock"
|
||||
|
||||
// see https://msdn.microsoft.com/en-us/library/windows/desktop/aa365203(v=vs.85).aspx
|
||||
flagLockExclusive = 2
|
||||
flagLockFailImmediately = 1
|
||||
|
||||
// see https://msdn.microsoft.com/en-us/library/windows/desktop/ms681382(v=vs.85).aspx
|
||||
errLockViolation syscall.Errno = 0x21
|
||||
)
|
||||
|
||||
func lockFileEx(h syscall.Handle, flags, reserved, locklow, lockhigh uint32, ol *syscall.Overlapped) (err error) {
|
||||
r, _, err := procLockFileEx.Call(uintptr(h), uintptr(flags), uintptr(reserved), uintptr(locklow), uintptr(lockhigh), uintptr(unsafe.Pointer(ol)))
|
||||
if r == 0 {
|
||||
return err
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
func unlockFileEx(h syscall.Handle, reserved, locklow, lockhigh uint32, ol *syscall.Overlapped) (err error) {
|
||||
r, _, err := procUnlockFileEx.Call(uintptr(h), uintptr(reserved), uintptr(locklow), uintptr(lockhigh), uintptr(unsafe.Pointer(ol)), 0)
|
||||
if r == 0 {
|
||||
return err
|
||||
}
|
||||
return nil
|
||||
}
|
||||
var odirect int
|
||||
|
||||
// fdatasync flushes written data to a file descriptor.
|
||||
func fdatasync(db *DB) error {
|
||||
@ -48,49 +16,13 @@ func fdatasync(db *DB) error {
|
||||
}
|
||||
|
||||
// flock acquires an advisory lock on a file descriptor.
|
||||
func flock(db *DB, mode os.FileMode, exclusive bool, timeout time.Duration) error {
|
||||
// Create a separate lock file on windows because a process
|
||||
// cannot share an exclusive lock on the same file. This is
|
||||
// needed during Tx.WriteTo().
|
||||
f, err := os.OpenFile(db.path+lockExt, os.O_CREATE, mode)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
db.lockfile = f
|
||||
|
||||
var t time.Time
|
||||
for {
|
||||
// If we're beyond our timeout then return an error.
|
||||
// This can only occur after we've attempted a flock once.
|
||||
if t.IsZero() {
|
||||
t = time.Now()
|
||||
} else if timeout > 0 && time.Since(t) > timeout {
|
||||
return ErrTimeout
|
||||
}
|
||||
|
||||
var flag uint32 = flagLockFailImmediately
|
||||
if exclusive {
|
||||
flag |= flagLockExclusive
|
||||
}
|
||||
|
||||
err := lockFileEx(syscall.Handle(db.lockfile.Fd()), flag, 0, 1, 0, &syscall.Overlapped{})
|
||||
if err == nil {
|
||||
return nil
|
||||
} else if err != errLockViolation {
|
||||
return err
|
||||
}
|
||||
|
||||
// Wait for a bit and try again.
|
||||
time.Sleep(50 * time.Millisecond)
|
||||
}
|
||||
func flock(f *os.File, _ bool, _ time.Duration) error {
|
||||
return nil
|
||||
}
|
||||
|
||||
// funlock releases an advisory lock on a file descriptor.
|
||||
func funlock(db *DB) error {
|
||||
err := unlockFileEx(syscall.Handle(db.lockfile.Fd()), 0, 1, 0, &syscall.Overlapped{})
|
||||
db.lockfile.Close()
|
||||
os.Remove(db.path+lockExt)
|
||||
return err
|
||||
func funlock(f *os.File) error {
|
||||
return nil
|
||||
}
|
||||
|
||||
// mmap memory maps a DB's data file.
|
||||
|
2
Godeps/_workspace/src/github.com/boltdb/bolt/boltsync_unix.go
generated
vendored
2
Godeps/_workspace/src/github.com/boltdb/bolt/boltsync_unix.go
generated
vendored
@ -2,6 +2,8 @@
|
||||
|
||||
package bolt
|
||||
|
||||
var odirect int
|
||||
|
||||
// fdatasync flushes written data to a file descriptor.
|
||||
func fdatasync(db *DB) error {
|
||||
return db.file.Sync()
|
||||
|
9
Godeps/_workspace/src/github.com/boltdb/bolt/bucket.go
generated
vendored
9
Godeps/_workspace/src/github.com/boltdb/bolt/bucket.go
generated
vendored
@ -11,7 +11,7 @@ const (
|
||||
MaxKeySize = 32768
|
||||
|
||||
// MaxValueSize is the maximum length of a value, in bytes.
|
||||
MaxValueSize = (1 << 31) - 2
|
||||
MaxValueSize = 4294967295
|
||||
)
|
||||
|
||||
const (
|
||||
@ -99,7 +99,6 @@ func (b *Bucket) Cursor() *Cursor {
|
||||
|
||||
// Bucket retrieves a nested bucket by name.
|
||||
// Returns nil if the bucket does not exist.
|
||||
// The bucket instance is only valid for the lifetime of the transaction.
|
||||
func (b *Bucket) Bucket(name []byte) *Bucket {
|
||||
if b.buckets != nil {
|
||||
if child := b.buckets[string(name)]; child != nil {
|
||||
@ -149,7 +148,6 @@ func (b *Bucket) openBucket(value []byte) *Bucket {
|
||||
|
||||
// CreateBucket creates a new bucket at the given key and returns the new bucket.
|
||||
// Returns an error if the key already exists, if the bucket name is blank, or if the bucket name is too long.
|
||||
// The bucket instance is only valid for the lifetime of the transaction.
|
||||
func (b *Bucket) CreateBucket(key []byte) (*Bucket, error) {
|
||||
if b.tx.db == nil {
|
||||
return nil, ErrTxClosed
|
||||
@ -194,7 +192,6 @@ func (b *Bucket) CreateBucket(key []byte) (*Bucket, error) {
|
||||
|
||||
// CreateBucketIfNotExists creates a new bucket if it doesn't already exist and returns a reference to it.
|
||||
// Returns an error if the bucket name is blank, or if the bucket name is too long.
|
||||
// The bucket instance is only valid for the lifetime of the transaction.
|
||||
func (b *Bucket) CreateBucketIfNotExists(key []byte) (*Bucket, error) {
|
||||
child, err := b.CreateBucket(key)
|
||||
if err == ErrBucketExists {
|
||||
@ -273,7 +270,6 @@ func (b *Bucket) Get(key []byte) []byte {
|
||||
|
||||
// Put sets the value for a key in the bucket.
|
||||
// If the key exist then its previous value will be overwritten.
|
||||
// Supplied value must remain valid for the life of the transaction.
|
||||
// Returns an error if the bucket was created from a read-only transaction, if the key is blank, if the key is too large, or if the value is too large.
|
||||
func (b *Bucket) Put(key []byte, value []byte) error {
|
||||
if b.tx.db == nil {
|
||||
@ -350,8 +346,7 @@ func (b *Bucket) NextSequence() (uint64, error) {
|
||||
|
||||
// ForEach executes a function for each key/value pair in a bucket.
|
||||
// If the provided function returns an error then the iteration is stopped and
|
||||
// the error is returned to the caller. The provided function must not modify
|
||||
// the bucket; this will result in undefined behavior.
|
||||
// the error is returned to the caller.
|
||||
func (b *Bucket) ForEach(fn func(k, v []byte) error) error {
|
||||
if b.tx.db == nil {
|
||||
return ErrTxClosed
|
||||
|
1169
Godeps/_workspace/src/github.com/boltdb/bolt/bucket_test.go
generated
vendored
Normal file
1169
Godeps/_workspace/src/github.com/boltdb/bolt/bucket_test.go
generated
vendored
Normal file
File diff suppressed because it is too large
Load Diff
5
Godeps/_workspace/src/github.com/boltdb/bolt/cmd/bolt/main.go
generated
vendored
5
Godeps/_workspace/src/github.com/boltdb/bolt/cmd/bolt/main.go
generated
vendored
@ -825,10 +825,7 @@ func (cmd *StatsCommand) Run(args ...string) error {
|
||||
|
||||
fmt.Fprintln(cmd.Stdout, "Bucket statistics")
|
||||
fmt.Fprintf(cmd.Stdout, "\tTotal number of buckets: %d\n", s.BucketN)
|
||||
percentage = 0
|
||||
if s.BucketN != 0 {
|
||||
percentage = int(float32(s.InlineBucketN) * 100.0 / float32(s.BucketN))
|
||||
}
|
||||
percentage = int(float32(s.InlineBucketN) * 100.0 / float32(s.BucketN))
|
||||
fmt.Fprintf(cmd.Stdout, "\tTotal number on inlined buckets: %d (%d%%)\n", s.InlineBucketN, percentage)
|
||||
percentage = 0
|
||||
if s.LeafInuse != 0 {
|
||||
|
145
Godeps/_workspace/src/github.com/boltdb/bolt/cmd/bolt/main_test.go
generated
vendored
Normal file
145
Godeps/_workspace/src/github.com/boltdb/bolt/cmd/bolt/main_test.go
generated
vendored
Normal file
@ -0,0 +1,145 @@
|
||||
package main_test
|
||||
|
||||
import (
|
||||
"bytes"
|
||||
"io/ioutil"
|
||||
"os"
|
||||
"strconv"
|
||||
"testing"
|
||||
|
||||
"github.com/coreos/etcd/Godeps/_workspace/src/github.com/boltdb/bolt"
|
||||
"github.com/coreos/etcd/Godeps/_workspace/src/github.com/boltdb/bolt/cmd/bolt"
|
||||
)
|
||||
|
||||
// Ensure the "info" command can print information about a database.
|
||||
func TestInfoCommand_Run(t *testing.T) {
|
||||
db := MustOpen(0666, nil)
|
||||
db.DB.Close()
|
||||
defer db.Close()
|
||||
|
||||
// Run the info command.
|
||||
m := NewMain()
|
||||
if err := m.Run("info", db.Path); err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
}
|
||||
|
||||
// Ensure the "stats" command can execute correctly.
|
||||
func TestStatsCommand_Run(t *testing.T) {
|
||||
// Ignore
|
||||
if os.Getpagesize() != 4096 {
|
||||
t.Skip("system does not use 4KB page size")
|
||||
}
|
||||
|
||||
db := MustOpen(0666, nil)
|
||||
defer db.Close()
|
||||
|
||||
if err := db.Update(func(tx *bolt.Tx) error {
|
||||
// Create "foo" bucket.
|
||||
b, err := tx.CreateBucket([]byte("foo"))
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
for i := 0; i < 10; i++ {
|
||||
if err := b.Put([]byte(strconv.Itoa(i)), []byte(strconv.Itoa(i))); err != nil {
|
||||
return err
|
||||
}
|
||||
}
|
||||
|
||||
// Create "bar" bucket.
|
||||
b, err = tx.CreateBucket([]byte("bar"))
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
for i := 0; i < 100; i++ {
|
||||
if err := b.Put([]byte(strconv.Itoa(i)), []byte(strconv.Itoa(i))); err != nil {
|
||||
return err
|
||||
}
|
||||
}
|
||||
|
||||
// Create "baz" bucket.
|
||||
b, err = tx.CreateBucket([]byte("baz"))
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
if err := b.Put([]byte("key"), []byte("value")); err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
return nil
|
||||
}); err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
db.DB.Close()
|
||||
|
||||
// Generate expected result.
|
||||
exp := "Aggregate statistics for 3 buckets\n\n" +
|
||||
"Page count statistics\n" +
|
||||
"\tNumber of logical branch pages: 0\n" +
|
||||
"\tNumber of physical branch overflow pages: 0\n" +
|
||||
"\tNumber of logical leaf pages: 1\n" +
|
||||
"\tNumber of physical leaf overflow pages: 0\n" +
|
||||
"Tree statistics\n" +
|
||||
"\tNumber of keys/value pairs: 111\n" +
|
||||
"\tNumber of levels in B+tree: 1\n" +
|
||||
"Page size utilization\n" +
|
||||
"\tBytes allocated for physical branch pages: 0\n" +
|
||||
"\tBytes actually used for branch data: 0 (0%)\n" +
|
||||
"\tBytes allocated for physical leaf pages: 4096\n" +
|
||||
"\tBytes actually used for leaf data: 1996 (48%)\n" +
|
||||
"Bucket statistics\n" +
|
||||
"\tTotal number of buckets: 3\n" +
|
||||
"\tTotal number on inlined buckets: 2 (66%)\n" +
|
||||
"\tBytes used for inlined buckets: 236 (11%)\n"
|
||||
|
||||
// Run the command.
|
||||
m := NewMain()
|
||||
if err := m.Run("stats", db.Path); err != nil {
|
||||
t.Fatal(err)
|
||||
} else if m.Stdout.String() != exp {
|
||||
t.Fatalf("unexpected stdout:\n\n%s", m.Stdout.String())
|
||||
}
|
||||
}
|
||||
|
||||
// Main represents a test wrapper for main.Main that records output.
|
||||
type Main struct {
|
||||
*main.Main
|
||||
Stdin bytes.Buffer
|
||||
Stdout bytes.Buffer
|
||||
Stderr bytes.Buffer
|
||||
}
|
||||
|
||||
// NewMain returns a new instance of Main.
|
||||
func NewMain() *Main {
|
||||
m := &Main{Main: main.NewMain()}
|
||||
m.Main.Stdin = &m.Stdin
|
||||
m.Main.Stdout = &m.Stdout
|
||||
m.Main.Stderr = &m.Stderr
|
||||
return m
|
||||
}
|
||||
|
||||
// MustOpen creates a Bolt database in a temporary location.
|
||||
func MustOpen(mode os.FileMode, options *bolt.Options) *DB {
|
||||
// Create temporary path.
|
||||
f, _ := ioutil.TempFile("", "bolt-")
|
||||
f.Close()
|
||||
os.Remove(f.Name())
|
||||
|
||||
db, err := bolt.Open(f.Name(), mode, options)
|
||||
if err != nil {
|
||||
panic(err.Error())
|
||||
}
|
||||
return &DB{DB: db, Path: f.Name()}
|
||||
}
|
||||
|
||||
// DB is a test wrapper for bolt.DB.
|
||||
type DB struct {
|
||||
*bolt.DB
|
||||
Path string
|
||||
}
|
||||
|
||||
// Close closes and removes the database.
|
||||
func (db *DB) Close() error {
|
||||
defer os.Remove(db.Path)
|
||||
return db.DB.Close()
|
||||
}
|
56
Godeps/_workspace/src/github.com/boltdb/bolt/cursor.go
generated
vendored
56
Godeps/_workspace/src/github.com/boltdb/bolt/cursor.go
generated
vendored
@ -34,13 +34,6 @@ func (c *Cursor) First() (key []byte, value []byte) {
|
||||
p, n := c.bucket.pageNode(c.bucket.root)
|
||||
c.stack = append(c.stack, elemRef{page: p, node: n, index: 0})
|
||||
c.first()
|
||||
|
||||
// If we land on an empty page then move to the next value.
|
||||
// https://github.com/boltdb/bolt/issues/450
|
||||
if c.stack[len(c.stack)-1].count() == 0 {
|
||||
c.next()
|
||||
}
|
||||
|
||||
k, v, flags := c.keyValue()
|
||||
if (flags & uint32(bucketLeafFlag)) != 0 {
|
||||
return k, nil
|
||||
@ -216,37 +209,28 @@ func (c *Cursor) last() {
|
||||
// next moves to the next leaf element and returns the key and value.
|
||||
// If the cursor is at the last leaf element then it stays there and returns nil.
|
||||
func (c *Cursor) next() (key []byte, value []byte, flags uint32) {
|
||||
for {
|
||||
// Attempt to move over one element until we're successful.
|
||||
// Move up the stack as we hit the end of each page in our stack.
|
||||
var i int
|
||||
for i = len(c.stack) - 1; i >= 0; i-- {
|
||||
elem := &c.stack[i]
|
||||
if elem.index < elem.count()-1 {
|
||||
elem.index++
|
||||
break
|
||||
}
|
||||
// Attempt to move over one element until we're successful.
|
||||
// Move up the stack as we hit the end of each page in our stack.
|
||||
var i int
|
||||
for i = len(c.stack) - 1; i >= 0; i-- {
|
||||
elem := &c.stack[i]
|
||||
if elem.index < elem.count()-1 {
|
||||
elem.index++
|
||||
break
|
||||
}
|
||||
|
||||
// If we've hit the root page then stop and return. This will leave the
|
||||
// cursor on the last element of the last page.
|
||||
if i == -1 {
|
||||
return nil, nil, 0
|
||||
}
|
||||
|
||||
// Otherwise start from where we left off in the stack and find the
|
||||
// first element of the first leaf page.
|
||||
c.stack = c.stack[:i+1]
|
||||
c.first()
|
||||
|
||||
// If this is an empty page then restart and move back up the stack.
|
||||
// https://github.com/boltdb/bolt/issues/450
|
||||
if c.stack[len(c.stack)-1].count() == 0 {
|
||||
continue
|
||||
}
|
||||
|
||||
return c.keyValue()
|
||||
}
|
||||
|
||||
// If we've hit the root page then stop and return. This will leave the
|
||||
// cursor on the last element of the last page.
|
||||
if i == -1 {
|
||||
return nil, nil, 0
|
||||
}
|
||||
|
||||
// Otherwise start from where we left off in the stack and find the
|
||||
// first element of the first leaf page.
|
||||
c.stack = c.stack[:i+1]
|
||||
c.first()
|
||||
return c.keyValue()
|
||||
}
|
||||
|
||||
// search recursively performs a binary search against a given page/node until it finds a given key.
|
||||
|
511
Godeps/_workspace/src/github.com/boltdb/bolt/cursor_test.go
generated
vendored
Normal file
511
Godeps/_workspace/src/github.com/boltdb/bolt/cursor_test.go
generated
vendored
Normal file
@ -0,0 +1,511 @@
|
||||
package bolt_test
|
||||
|
||||
import (
|
||||
"bytes"
|
||||
"encoding/binary"
|
||||
"fmt"
|
||||
"os"
|
||||
"sort"
|
||||
"testing"
|
||||
"testing/quick"
|
||||
|
||||
"github.com/coreos/etcd/Godeps/_workspace/src/github.com/boltdb/bolt"
|
||||
)
|
||||
|
||||
// Ensure that a cursor can return a reference to the bucket that created it.
|
||||
func TestCursor_Bucket(t *testing.T) {
|
||||
db := NewTestDB()
|
||||
defer db.Close()
|
||||
db.Update(func(tx *bolt.Tx) error {
|
||||
b, _ := tx.CreateBucket([]byte("widgets"))
|
||||
c := b.Cursor()
|
||||
equals(t, b, c.Bucket())
|
||||
return nil
|
||||
})
|
||||
}
|
||||
|
||||
// Ensure that a Tx cursor can seek to the appropriate keys.
|
||||
func TestCursor_Seek(t *testing.T) {
|
||||
db := NewTestDB()
|
||||
defer db.Close()
|
||||
db.Update(func(tx *bolt.Tx) error {
|
||||
b, err := tx.CreateBucket([]byte("widgets"))
|
||||
ok(t, err)
|
||||
ok(t, b.Put([]byte("foo"), []byte("0001")))
|
||||
ok(t, b.Put([]byte("bar"), []byte("0002")))
|
||||
ok(t, b.Put([]byte("baz"), []byte("0003")))
|
||||
_, err = b.CreateBucket([]byte("bkt"))
|
||||
ok(t, err)
|
||||
return nil
|
||||
})
|
||||
db.View(func(tx *bolt.Tx) error {
|
||||
c := tx.Bucket([]byte("widgets")).Cursor()
|
||||
|
||||
// Exact match should go to the key.
|
||||
k, v := c.Seek([]byte("bar"))
|
||||
equals(t, []byte("bar"), k)
|
||||
equals(t, []byte("0002"), v)
|
||||
|
||||
// Inexact match should go to the next key.
|
||||
k, v = c.Seek([]byte("bas"))
|
||||
equals(t, []byte("baz"), k)
|
||||
equals(t, []byte("0003"), v)
|
||||
|
||||
// Low key should go to the first key.
|
||||
k, v = c.Seek([]byte(""))
|
||||
equals(t, []byte("bar"), k)
|
||||
equals(t, []byte("0002"), v)
|
||||
|
||||
// High key should return no key.
|
||||
k, v = c.Seek([]byte("zzz"))
|
||||
assert(t, k == nil, "")
|
||||
assert(t, v == nil, "")
|
||||
|
||||
// Buckets should return their key but no value.
|
||||
k, v = c.Seek([]byte("bkt"))
|
||||
equals(t, []byte("bkt"), k)
|
||||
assert(t, v == nil, "")
|
||||
|
||||
return nil
|
||||
})
|
||||
}
|
||||
|
||||
func TestCursor_Delete(t *testing.T) {
|
||||
db := NewTestDB()
|
||||
defer db.Close()
|
||||
|
||||
var count = 1000
|
||||
|
||||
// Insert every other key between 0 and $count.
|
||||
db.Update(func(tx *bolt.Tx) error {
|
||||
b, _ := tx.CreateBucket([]byte("widgets"))
|
||||
for i := 0; i < count; i += 1 {
|
||||
k := make([]byte, 8)
|
||||
binary.BigEndian.PutUint64(k, uint64(i))
|
||||
b.Put(k, make([]byte, 100))
|
||||
}
|
||||
b.CreateBucket([]byte("sub"))
|
||||
return nil
|
||||
})
|
||||
|
||||
db.Update(func(tx *bolt.Tx) error {
|
||||
c := tx.Bucket([]byte("widgets")).Cursor()
|
||||
bound := make([]byte, 8)
|
||||
binary.BigEndian.PutUint64(bound, uint64(count/2))
|
||||
for key, _ := c.First(); bytes.Compare(key, bound) < 0; key, _ = c.Next() {
|
||||
if err := c.Delete(); err != nil {
|
||||
return err
|
||||
}
|
||||
}
|
||||
c.Seek([]byte("sub"))
|
||||
err := c.Delete()
|
||||
equals(t, err, bolt.ErrIncompatibleValue)
|
||||
return nil
|
||||
})
|
||||
|
||||
db.View(func(tx *bolt.Tx) error {
|
||||
b := tx.Bucket([]byte("widgets"))
|
||||
equals(t, b.Stats().KeyN, count/2+1)
|
||||
return nil
|
||||
})
|
||||
}
|
||||
|
||||
// Ensure that a Tx cursor can seek to the appropriate keys when there are a
|
||||
// large number of keys. This test also checks that seek will always move
|
||||
// forward to the next key.
|
||||
//
|
||||
// Related: https://github.com/boltdb/bolt/pull/187
|
||||
func TestCursor_Seek_Large(t *testing.T) {
|
||||
db := NewTestDB()
|
||||
defer db.Close()
|
||||
|
||||
var count = 10000
|
||||
|
||||
// Insert every other key between 0 and $count.
|
||||
db.Update(func(tx *bolt.Tx) error {
|
||||
b, _ := tx.CreateBucket([]byte("widgets"))
|
||||
for i := 0; i < count; i += 100 {
|
||||
for j := i; j < i+100; j += 2 {
|
||||
k := make([]byte, 8)
|
||||
binary.BigEndian.PutUint64(k, uint64(j))
|
||||
b.Put(k, make([]byte, 100))
|
||||
}
|
||||
}
|
||||
return nil
|
||||
})
|
||||
|
||||
db.View(func(tx *bolt.Tx) error {
|
||||
c := tx.Bucket([]byte("widgets")).Cursor()
|
||||
for i := 0; i < count; i++ {
|
||||
seek := make([]byte, 8)
|
||||
binary.BigEndian.PutUint64(seek, uint64(i))
|
||||
|
||||
k, _ := c.Seek(seek)
|
||||
|
||||
// The last seek is beyond the end of the the range so
|
||||
// it should return nil.
|
||||
if i == count-1 {
|
||||
assert(t, k == nil, "")
|
||||
continue
|
||||
}
|
||||
|
||||
// Otherwise we should seek to the exact key or the next key.
|
||||
num := binary.BigEndian.Uint64(k)
|
||||
if i%2 == 0 {
|
||||
equals(t, uint64(i), num)
|
||||
} else {
|
||||
equals(t, uint64(i+1), num)
|
||||
}
|
||||
}
|
||||
|
||||
return nil
|
||||
})
|
||||
}
|
||||
|
||||
// Ensure that a cursor can iterate over an empty bucket without error.
|
||||
func TestCursor_EmptyBucket(t *testing.T) {
|
||||
db := NewTestDB()
|
||||
defer db.Close()
|
||||
db.Update(func(tx *bolt.Tx) error {
|
||||
_, err := tx.CreateBucket([]byte("widgets"))
|
||||
return err
|
||||
})
|
||||
db.View(func(tx *bolt.Tx) error {
|
||||
c := tx.Bucket([]byte("widgets")).Cursor()
|
||||
k, v := c.First()
|
||||
assert(t, k == nil, "")
|
||||
assert(t, v == nil, "")
|
||||
return nil
|
||||
})
|
||||
}
|
||||
|
||||
// Ensure that a Tx cursor can reverse iterate over an empty bucket without error.
|
||||
func TestCursor_EmptyBucketReverse(t *testing.T) {
|
||||
db := NewTestDB()
|
||||
defer db.Close()
|
||||
|
||||
db.Update(func(tx *bolt.Tx) error {
|
||||
_, err := tx.CreateBucket([]byte("widgets"))
|
||||
return err
|
||||
})
|
||||
db.View(func(tx *bolt.Tx) error {
|
||||
c := tx.Bucket([]byte("widgets")).Cursor()
|
||||
k, v := c.Last()
|
||||
assert(t, k == nil, "")
|
||||
assert(t, v == nil, "")
|
||||
return nil
|
||||
})
|
||||
}
|
||||
|
||||
// Ensure that a Tx cursor can iterate over a single root with a couple elements.
|
||||
func TestCursor_Iterate_Leaf(t *testing.T) {
|
||||
db := NewTestDB()
|
||||
defer db.Close()
|
||||
|
||||
db.Update(func(tx *bolt.Tx) error {
|
||||
tx.CreateBucket([]byte("widgets"))
|
||||
tx.Bucket([]byte("widgets")).Put([]byte("baz"), []byte{})
|
||||
tx.Bucket([]byte("widgets")).Put([]byte("foo"), []byte{0})
|
||||
tx.Bucket([]byte("widgets")).Put([]byte("bar"), []byte{1})
|
||||
return nil
|
||||
})
|
||||
tx, _ := db.Begin(false)
|
||||
c := tx.Bucket([]byte("widgets")).Cursor()
|
||||
|
||||
k, v := c.First()
|
||||
equals(t, string(k), "bar")
|
||||
equals(t, v, []byte{1})
|
||||
|
||||
k, v = c.Next()
|
||||
equals(t, string(k), "baz")
|
||||
equals(t, v, []byte{})
|
||||
|
||||
k, v = c.Next()
|
||||
equals(t, string(k), "foo")
|
||||
equals(t, v, []byte{0})
|
||||
|
||||
k, v = c.Next()
|
||||
assert(t, k == nil, "")
|
||||
assert(t, v == nil, "")
|
||||
|
||||
k, v = c.Next()
|
||||
assert(t, k == nil, "")
|
||||
assert(t, v == nil, "")
|
||||
|
||||
tx.Rollback()
|
||||
}
|
||||
|
||||
// Ensure that a Tx cursor can iterate in reverse over a single root with a couple elements.
|
||||
func TestCursor_LeafRootReverse(t *testing.T) {
|
||||
db := NewTestDB()
|
||||
defer db.Close()
|
||||
|
||||
db.Update(func(tx *bolt.Tx) error {
|
||||
tx.CreateBucket([]byte("widgets"))
|
||||
tx.Bucket([]byte("widgets")).Put([]byte("baz"), []byte{})
|
||||
tx.Bucket([]byte("widgets")).Put([]byte("foo"), []byte{0})
|
||||
tx.Bucket([]byte("widgets")).Put([]byte("bar"), []byte{1})
|
||||
return nil
|
||||
})
|
||||
tx, _ := db.Begin(false)
|
||||
c := tx.Bucket([]byte("widgets")).Cursor()
|
||||
|
||||
k, v := c.Last()
|
||||
equals(t, string(k), "foo")
|
||||
equals(t, v, []byte{0})
|
||||
|
||||
k, v = c.Prev()
|
||||
equals(t, string(k), "baz")
|
||||
equals(t, v, []byte{})
|
||||
|
||||
k, v = c.Prev()
|
||||
equals(t, string(k), "bar")
|
||||
equals(t, v, []byte{1})
|
||||
|
||||
k, v = c.Prev()
|
||||
assert(t, k == nil, "")
|
||||
assert(t, v == nil, "")
|
||||
|
||||
k, v = c.Prev()
|
||||
assert(t, k == nil, "")
|
||||
assert(t, v == nil, "")
|
||||
|
||||
tx.Rollback()
|
||||
}
|
||||
|
||||
// Ensure that a Tx cursor can restart from the beginning.
|
||||
func TestCursor_Restart(t *testing.T) {
|
||||
db := NewTestDB()
|
||||
defer db.Close()
|
||||
|
||||
db.Update(func(tx *bolt.Tx) error {
|
||||
tx.CreateBucket([]byte("widgets"))
|
||||
tx.Bucket([]byte("widgets")).Put([]byte("bar"), []byte{})
|
||||
tx.Bucket([]byte("widgets")).Put([]byte("foo"), []byte{})
|
||||
return nil
|
||||
})
|
||||
|
||||
tx, _ := db.Begin(false)
|
||||
c := tx.Bucket([]byte("widgets")).Cursor()
|
||||
|
||||
k, _ := c.First()
|
||||
equals(t, string(k), "bar")
|
||||
|
||||
k, _ = c.Next()
|
||||
equals(t, string(k), "foo")
|
||||
|
||||
k, _ = c.First()
|
||||
equals(t, string(k), "bar")
|
||||
|
||||
k, _ = c.Next()
|
||||
equals(t, string(k), "foo")
|
||||
|
||||
tx.Rollback()
|
||||
}
|
||||
|
||||
// Ensure that a Tx can iterate over all elements in a bucket.
|
||||
func TestCursor_QuickCheck(t *testing.T) {
|
||||
f := func(items testdata) bool {
|
||||
db := NewTestDB()
|
||||
defer db.Close()
|
||||
|
||||
// Bulk insert all values.
|
||||
tx, _ := db.Begin(true)
|
||||
tx.CreateBucket([]byte("widgets"))
|
||||
b := tx.Bucket([]byte("widgets"))
|
||||
for _, item := range items {
|
||||
ok(t, b.Put(item.Key, item.Value))
|
||||
}
|
||||
ok(t, tx.Commit())
|
||||
|
||||
// Sort test data.
|
||||
sort.Sort(items)
|
||||
|
||||
// Iterate over all items and check consistency.
|
||||
var index = 0
|
||||
tx, _ = db.Begin(false)
|
||||
c := tx.Bucket([]byte("widgets")).Cursor()
|
||||
for k, v := c.First(); k != nil && index < len(items); k, v = c.Next() {
|
||||
equals(t, k, items[index].Key)
|
||||
equals(t, v, items[index].Value)
|
||||
index++
|
||||
}
|
||||
equals(t, len(items), index)
|
||||
tx.Rollback()
|
||||
|
||||
return true
|
||||
}
|
||||
if err := quick.Check(f, qconfig()); err != nil {
|
||||
t.Error(err)
|
||||
}
|
||||
}
|
||||
|
||||
// Ensure that a transaction can iterate over all elements in a bucket in reverse.
|
||||
func TestCursor_QuickCheck_Reverse(t *testing.T) {
|
||||
f := func(items testdata) bool {
|
||||
db := NewTestDB()
|
||||
defer db.Close()
|
||||
|
||||
// Bulk insert all values.
|
||||
tx, _ := db.Begin(true)
|
||||
tx.CreateBucket([]byte("widgets"))
|
||||
b := tx.Bucket([]byte("widgets"))
|
||||
for _, item := range items {
|
||||
ok(t, b.Put(item.Key, item.Value))
|
||||
}
|
||||
ok(t, tx.Commit())
|
||||
|
||||
// Sort test data.
|
||||
sort.Sort(revtestdata(items))
|
||||
|
||||
// Iterate over all items and check consistency.
|
||||
var index = 0
|
||||
tx, _ = db.Begin(false)
|
||||
c := tx.Bucket([]byte("widgets")).Cursor()
|
||||
for k, v := c.Last(); k != nil && index < len(items); k, v = c.Prev() {
|
||||
equals(t, k, items[index].Key)
|
||||
equals(t, v, items[index].Value)
|
||||
index++
|
||||
}
|
||||
equals(t, len(items), index)
|
||||
tx.Rollback()
|
||||
|
||||
return true
|
||||
}
|
||||
if err := quick.Check(f, qconfig()); err != nil {
|
||||
t.Error(err)
|
||||
}
|
||||
}
|
||||
|
||||
// Ensure that a Tx cursor can iterate over subbuckets.
|
||||
func TestCursor_QuickCheck_BucketsOnly(t *testing.T) {
|
||||
db := NewTestDB()
|
||||
defer db.Close()
|
||||
|
||||
db.Update(func(tx *bolt.Tx) error {
|
||||
b, err := tx.CreateBucket([]byte("widgets"))
|
||||
ok(t, err)
|
||||
_, err = b.CreateBucket([]byte("foo"))
|
||||
ok(t, err)
|
||||
_, err = b.CreateBucket([]byte("bar"))
|
||||
ok(t, err)
|
||||
_, err = b.CreateBucket([]byte("baz"))
|
||||
ok(t, err)
|
||||
return nil
|
||||
})
|
||||
db.View(func(tx *bolt.Tx) error {
|
||||
var names []string
|
||||
c := tx.Bucket([]byte("widgets")).Cursor()
|
||||
for k, v := c.First(); k != nil; k, v = c.Next() {
|
||||
names = append(names, string(k))
|
||||
assert(t, v == nil, "")
|
||||
}
|
||||
equals(t, names, []string{"bar", "baz", "foo"})
|
||||
return nil
|
||||
})
|
||||
}
|
||||
|
||||
// Ensure that a Tx cursor can reverse iterate over subbuckets.
|
||||
func TestCursor_QuickCheck_BucketsOnly_Reverse(t *testing.T) {
|
||||
db := NewTestDB()
|
||||
defer db.Close()
|
||||
|
||||
db.Update(func(tx *bolt.Tx) error {
|
||||
b, err := tx.CreateBucket([]byte("widgets"))
|
||||
ok(t, err)
|
||||
_, err = b.CreateBucket([]byte("foo"))
|
||||
ok(t, err)
|
||||
_, err = b.CreateBucket([]byte("bar"))
|
||||
ok(t, err)
|
||||
_, err = b.CreateBucket([]byte("baz"))
|
||||
ok(t, err)
|
||||
return nil
|
||||
})
|
||||
db.View(func(tx *bolt.Tx) error {
|
||||
var names []string
|
||||
c := tx.Bucket([]byte("widgets")).Cursor()
|
||||
for k, v := c.Last(); k != nil; k, v = c.Prev() {
|
||||
names = append(names, string(k))
|
||||
assert(t, v == nil, "")
|
||||
}
|
||||
equals(t, names, []string{"foo", "baz", "bar"})
|
||||
return nil
|
||||
})
|
||||
}
|
||||
|
||||
func ExampleCursor() {
|
||||
// Open the database.
|
||||
db, _ := bolt.Open(tempfile(), 0666, nil)
|
||||
defer os.Remove(db.Path())
|
||||
defer db.Close()
|
||||
|
||||
// Start a read-write transaction.
|
||||
db.Update(func(tx *bolt.Tx) error {
|
||||
// Create a new bucket.
|
||||
tx.CreateBucket([]byte("animals"))
|
||||
|
||||
// Insert data into a bucket.
|
||||
b := tx.Bucket([]byte("animals"))
|
||||
b.Put([]byte("dog"), []byte("fun"))
|
||||
b.Put([]byte("cat"), []byte("lame"))
|
||||
b.Put([]byte("liger"), []byte("awesome"))
|
||||
|
||||
// Create a cursor for iteration.
|
||||
c := b.Cursor()
|
||||
|
||||
// Iterate over items in sorted key order. This starts from the
|
||||
// first key/value pair and updates the k/v variables to the
|
||||
// next key/value on each iteration.
|
||||
//
|
||||
// The loop finishes at the end of the cursor when a nil key is returned.
|
||||
for k, v := c.First(); k != nil; k, v = c.Next() {
|
||||
fmt.Printf("A %s is %s.\n", k, v)
|
||||
}
|
||||
|
||||
return nil
|
||||
})
|
||||
|
||||
// Output:
|
||||
// A cat is lame.
|
||||
// A dog is fun.
|
||||
// A liger is awesome.
|
||||
}
|
||||
|
||||
func ExampleCursor_reverse() {
|
||||
// Open the database.
|
||||
db, _ := bolt.Open(tempfile(), 0666, nil)
|
||||
defer os.Remove(db.Path())
|
||||
defer db.Close()
|
||||
|
||||
// Start a read-write transaction.
|
||||
db.Update(func(tx *bolt.Tx) error {
|
||||
// Create a new bucket.
|
||||
tx.CreateBucket([]byte("animals"))
|
||||
|
||||
// Insert data into a bucket.
|
||||
b := tx.Bucket([]byte("animals"))
|
||||
b.Put([]byte("dog"), []byte("fun"))
|
||||
b.Put([]byte("cat"), []byte("lame"))
|
||||
b.Put([]byte("liger"), []byte("awesome"))
|
||||
|
||||
// Create a cursor for iteration.
|
||||
c := b.Cursor()
|
||||
|
||||
// Iterate over items in reverse sorted key order. This starts
|
||||
// from the last key/value pair and updates the k/v variables to
|
||||
// the previous key/value on each iteration.
|
||||
//
|
||||
// The loop finishes at the beginning of the cursor when a nil key
|
||||
// is returned.
|
||||
for k, v := c.Last(); k != nil; k, v = c.Prev() {
|
||||
fmt.Printf("A %s is %s.\n", k, v)
|
||||
}
|
||||
|
||||
return nil
|
||||
})
|
||||
|
||||
// Output:
|
||||
// A liger is awesome.
|
||||
// A dog is fun.
|
||||
// A cat is lame.
|
||||
}
|
221
Godeps/_workspace/src/github.com/boltdb/bolt/db.go
generated
vendored
221
Godeps/_workspace/src/github.com/boltdb/bolt/db.go
generated
vendored
@ -1,10 +1,8 @@
|
||||
package bolt
|
||||
|
||||
import (
|
||||
"errors"
|
||||
"fmt"
|
||||
"hash/fnv"
|
||||
"log"
|
||||
"os"
|
||||
"runtime"
|
||||
"runtime/debug"
|
||||
@ -26,14 +24,13 @@ const magic uint32 = 0xED0CDAED
|
||||
// IgnoreNoSync specifies whether the NoSync field of a DB is ignored when
|
||||
// syncing changes to a file. This is required as some operating systems,
|
||||
// such as OpenBSD, do not have a unified buffer cache (UBC) and writes
|
||||
// must be synchronized using the msync(2) syscall.
|
||||
// must be synchronzied using the msync(2) syscall.
|
||||
const IgnoreNoSync = runtime.GOOS == "openbsd"
|
||||
|
||||
// Default values if not set in a DB instance.
|
||||
const (
|
||||
DefaultMaxBatchSize int = 1000
|
||||
DefaultMaxBatchDelay = 10 * time.Millisecond
|
||||
DefaultAllocSize = 16 * 1024 * 1024
|
||||
)
|
||||
|
||||
// DB represents a collection of buckets persisted to a file on disk.
|
||||
@ -66,10 +63,6 @@ type DB struct {
|
||||
// https://github.com/boltdb/bolt/issues/284
|
||||
NoGrowSync bool
|
||||
|
||||
// If you want to read the entire database fast, you can set MmapFlag to
|
||||
// syscall.MAP_POPULATE on Linux 2.6.23+ for sequential read-ahead.
|
||||
MmapFlags int
|
||||
|
||||
// MaxBatchSize is the maximum size of a batch. Default value is
|
||||
// copied from DefaultMaxBatchSize in Open.
|
||||
//
|
||||
@ -86,18 +79,11 @@ type DB struct {
|
||||
// Do not change concurrently with calls to Batch.
|
||||
MaxBatchDelay time.Duration
|
||||
|
||||
// AllocSize is the amount of space allocated when the database
|
||||
// needs to create new pages. This is done to amortize the cost
|
||||
// of truncate() and fsync() when growing the data file.
|
||||
AllocSize int
|
||||
|
||||
path string
|
||||
file *os.File
|
||||
lockfile *os.File // windows only
|
||||
dataref []byte // mmap'ed readonly, write throws SEGV
|
||||
data *[maxMapSize]byte
|
||||
datasz int
|
||||
filesz int // current on disk file size
|
||||
meta0 *meta
|
||||
meta1 *meta
|
||||
pageSize int
|
||||
@ -150,12 +136,10 @@ func Open(path string, mode os.FileMode, options *Options) (*DB, error) {
|
||||
options = DefaultOptions
|
||||
}
|
||||
db.NoGrowSync = options.NoGrowSync
|
||||
db.MmapFlags = options.MmapFlags
|
||||
|
||||
// Set default values for later DB operations.
|
||||
db.MaxBatchSize = DefaultMaxBatchSize
|
||||
db.MaxBatchDelay = DefaultMaxBatchDelay
|
||||
db.AllocSize = DefaultAllocSize
|
||||
|
||||
flag := os.O_RDWR
|
||||
if options.ReadOnly {
|
||||
@ -178,7 +162,7 @@ func Open(path string, mode os.FileMode, options *Options) (*DB, error) {
|
||||
// if !options.ReadOnly.
|
||||
// The database file is locked using the shared lock (more than one process may
|
||||
// hold a lock at the same time) otherwise (options.ReadOnly is set).
|
||||
if err := flock(db, mode, !db.readOnly, options.Timeout); err != nil {
|
||||
if err := flock(db.file, !db.readOnly, options.Timeout); err != nil {
|
||||
_ = db.close()
|
||||
return nil, err
|
||||
}
|
||||
@ -188,7 +172,7 @@ func Open(path string, mode os.FileMode, options *Options) (*DB, error) {
|
||||
|
||||
// Initialize the database if it doesn't exist.
|
||||
if info, err := db.file.Stat(); err != nil {
|
||||
return nil, err
|
||||
return nil, fmt.Errorf("stat error: %s", err)
|
||||
} else if info.Size() == 0 {
|
||||
// Initialize new files with meta pages.
|
||||
if err := db.init(); err != nil {
|
||||
@ -200,14 +184,14 @@ func Open(path string, mode os.FileMode, options *Options) (*DB, error) {
|
||||
if _, err := db.file.ReadAt(buf[:], 0); err == nil {
|
||||
m := db.pageInBuffer(buf[:], 0).meta()
|
||||
if err := m.validate(); err != nil {
|
||||
return nil, err
|
||||
return nil, fmt.Errorf("meta0 error: %s", err)
|
||||
}
|
||||
db.pageSize = int(m.pageSize)
|
||||
}
|
||||
}
|
||||
|
||||
// Memory map the data file.
|
||||
if err := db.mmap(options.InitialMmapSize); err != nil {
|
||||
if err := db.mmap(0); err != nil {
|
||||
_ = db.close()
|
||||
return nil, err
|
||||
}
|
||||
@ -264,10 +248,10 @@ func (db *DB) mmap(minsz int) error {
|
||||
|
||||
// Validate the meta pages.
|
||||
if err := db.meta0.validate(); err != nil {
|
||||
return err
|
||||
return fmt.Errorf("meta0 error: %s", err)
|
||||
}
|
||||
if err := db.meta1.validate(); err != nil {
|
||||
return err
|
||||
return fmt.Errorf("meta1 error: %s", err)
|
||||
}
|
||||
|
||||
return nil
|
||||
@ -282,7 +266,7 @@ func (db *DB) munmap() error {
|
||||
}
|
||||
|
||||
// mmapSize determines the appropriate size for the mmap given the current size
|
||||
// of the database. The minimum size is 32KB and doubles until it reaches 1GB.
|
||||
// of the database. The minimum size is 1MB and doubles until it reaches 1GB.
|
||||
// Returns an error if the new mmap size is greater than the max allowed.
|
||||
func (db *DB) mmapSize(size int) (int, error) {
|
||||
// Double the size from 32KB until 1GB.
|
||||
@ -380,10 +364,6 @@ func (db *DB) Close() error {
|
||||
}
|
||||
|
||||
func (db *DB) close() error {
|
||||
if !db.opened {
|
||||
return nil
|
||||
}
|
||||
|
||||
db.opened = false
|
||||
|
||||
db.freelist = nil
|
||||
@ -402,9 +382,7 @@ func (db *DB) close() error {
|
||||
// No need to unlock read-only file.
|
||||
if !db.readOnly {
|
||||
// Unlock the file.
|
||||
if err := funlock(db); err != nil {
|
||||
log.Printf("bolt.Close(): funlock error: %s", err)
|
||||
}
|
||||
_ = funlock(db.file)
|
||||
}
|
||||
|
||||
// Close the file descriptor.
|
||||
@ -423,15 +401,11 @@ func (db *DB) close() error {
|
||||
// will cause the calls to block and be serialized until the current write
|
||||
// transaction finishes.
|
||||
//
|
||||
// Transactions should not be dependent on one another. Opening a read
|
||||
// Transactions should not be depedent on one another. Opening a read
|
||||
// transaction and a write transaction in the same goroutine can cause the
|
||||
// writer to deadlock because the database periodically needs to re-mmap itself
|
||||
// as it grows and it cannot do that while a read transaction is open.
|
||||
//
|
||||
// If a long running read transaction (for example, a snapshot transaction) is
|
||||
// needed, you might want to set DB.InitialMmapSize to a large enough value
|
||||
// to avoid potential blocking of write transaction.
|
||||
//
|
||||
// IMPORTANT: You must close read-only transactions after you are finished or
|
||||
// else the database will not reclaim old pages.
|
||||
func (db *DB) Begin(writable bool) (*Tx, error) {
|
||||
@ -615,136 +589,6 @@ func (db *DB) View(fn func(*Tx) error) error {
|
||||
return nil
|
||||
}
|
||||
|
||||
// Batch calls fn as part of a batch. It behaves similar to Update,
|
||||
// except:
|
||||
//
|
||||
// 1. concurrent Batch calls can be combined into a single Bolt
|
||||
// transaction.
|
||||
//
|
||||
// 2. the function passed to Batch may be called multiple times,
|
||||
// regardless of whether it returns error or not.
|
||||
//
|
||||
// This means that Batch function side effects must be idempotent and
|
||||
// take permanent effect only after a successful return is seen in
|
||||
// caller.
|
||||
//
|
||||
// The maximum batch size and delay can be adjusted with DB.MaxBatchSize
|
||||
// and DB.MaxBatchDelay, respectively.
|
||||
//
|
||||
// Batch is only useful when there are multiple goroutines calling it.
|
||||
func (db *DB) Batch(fn func(*Tx) error) error {
|
||||
errCh := make(chan error, 1)
|
||||
|
||||
db.batchMu.Lock()
|
||||
if (db.batch == nil) || (db.batch != nil && len(db.batch.calls) >= db.MaxBatchSize) {
|
||||
// There is no existing batch, or the existing batch is full; start a new one.
|
||||
db.batch = &batch{
|
||||
db: db,
|
||||
}
|
||||
db.batch.timer = time.AfterFunc(db.MaxBatchDelay, db.batch.trigger)
|
||||
}
|
||||
db.batch.calls = append(db.batch.calls, call{fn: fn, err: errCh})
|
||||
if len(db.batch.calls) >= db.MaxBatchSize {
|
||||
// wake up batch, it's ready to run
|
||||
go db.batch.trigger()
|
||||
}
|
||||
db.batchMu.Unlock()
|
||||
|
||||
err := <-errCh
|
||||
if err == trySolo {
|
||||
err = db.Update(fn)
|
||||
}
|
||||
return err
|
||||
}
|
||||
|
||||
type call struct {
|
||||
fn func(*Tx) error
|
||||
err chan<- error
|
||||
}
|
||||
|
||||
type batch struct {
|
||||
db *DB
|
||||
timer *time.Timer
|
||||
start sync.Once
|
||||
calls []call
|
||||
}
|
||||
|
||||
// trigger runs the batch if it hasn't already been run.
|
||||
func (b *batch) trigger() {
|
||||
b.start.Do(b.run)
|
||||
}
|
||||
|
||||
// run performs the transactions in the batch and communicates results
|
||||
// back to DB.Batch.
|
||||
func (b *batch) run() {
|
||||
b.db.batchMu.Lock()
|
||||
b.timer.Stop()
|
||||
// Make sure no new work is added to this batch, but don't break
|
||||
// other batches.
|
||||
if b.db.batch == b {
|
||||
b.db.batch = nil
|
||||
}
|
||||
b.db.batchMu.Unlock()
|
||||
|
||||
retry:
|
||||
for len(b.calls) > 0 {
|
||||
var failIdx = -1
|
||||
err := b.db.Update(func(tx *Tx) error {
|
||||
for i, c := range b.calls {
|
||||
if err := safelyCall(c.fn, tx); err != nil {
|
||||
failIdx = i
|
||||
return err
|
||||
}
|
||||
}
|
||||
return nil
|
||||
})
|
||||
|
||||
if failIdx >= 0 {
|
||||
// take the failing transaction out of the batch. it's
|
||||
// safe to shorten b.calls here because db.batch no longer
|
||||
// points to us, and we hold the mutex anyway.
|
||||
c := b.calls[failIdx]
|
||||
b.calls[failIdx], b.calls = b.calls[len(b.calls)-1], b.calls[:len(b.calls)-1]
|
||||
// tell the submitter re-run it solo, continue with the rest of the batch
|
||||
c.err <- trySolo
|
||||
continue retry
|
||||
}
|
||||
|
||||
// pass success, or bolt internal errors, to all callers
|
||||
for _, c := range b.calls {
|
||||
if c.err != nil {
|
||||
c.err <- err
|
||||
}
|
||||
}
|
||||
break retry
|
||||
}
|
||||
}
|
||||
|
||||
// trySolo is a special sentinel error value used for signaling that a
|
||||
// transaction function should be re-run. It should never be seen by
|
||||
// callers.
|
||||
var trySolo = errors.New("batch function returned an error and should be re-run solo")
|
||||
|
||||
type panicked struct {
|
||||
reason interface{}
|
||||
}
|
||||
|
||||
func (p panicked) Error() string {
|
||||
if err, ok := p.reason.(error); ok {
|
||||
return err.Error()
|
||||
}
|
||||
return fmt.Sprintf("panic: %v", p.reason)
|
||||
}
|
||||
|
||||
func safelyCall(fn func(*Tx) error, tx *Tx) (err error) {
|
||||
defer func() {
|
||||
if p := recover(); p != nil {
|
||||
err = panicked{p}
|
||||
}
|
||||
}()
|
||||
return fn(tx)
|
||||
}
|
||||
|
||||
// Sync executes fdatasync() against the database file handle.
|
||||
//
|
||||
// This is not necessary under normal operation, however, if you use NoSync
|
||||
@ -811,38 +655,6 @@ func (db *DB) allocate(count int) (*page, error) {
|
||||
return p, nil
|
||||
}
|
||||
|
||||
// grow grows the size of the database to the given sz.
|
||||
func (db *DB) grow(sz int) error {
|
||||
// Ignore if the new size is less than available file size.
|
||||
if sz <= db.filesz {
|
||||
return nil
|
||||
}
|
||||
|
||||
// If the data is smaller than the alloc size then only allocate what's needed.
|
||||
// Once it goes over the allocation size then allocate in chunks.
|
||||
if db.datasz < db.AllocSize {
|
||||
sz = db.datasz
|
||||
} else {
|
||||
sz += db.AllocSize
|
||||
}
|
||||
|
||||
// Truncate and fsync to ensure file size metadata is flushed.
|
||||
// https://github.com/boltdb/bolt/issues/284
|
||||
if !db.NoGrowSync && !db.readOnly {
|
||||
if runtime.GOOS != "windows" {
|
||||
if err := db.file.Truncate(int64(sz)); err != nil {
|
||||
return fmt.Errorf("file resize error: %s", err)
|
||||
}
|
||||
}
|
||||
if err := db.file.Sync(); err != nil {
|
||||
return fmt.Errorf("file sync error: %s", err)
|
||||
}
|
||||
}
|
||||
|
||||
db.filesz = sz
|
||||
return nil
|
||||
}
|
||||
|
||||
func (db *DB) IsReadOnly() bool {
|
||||
return db.readOnly
|
||||
}
|
||||
@ -860,19 +672,6 @@ type Options struct {
|
||||
// Open database in read-only mode. Uses flock(..., LOCK_SH |LOCK_NB) to
|
||||
// grab a shared lock (UNIX).
|
||||
ReadOnly bool
|
||||
|
||||
// Sets the DB.MmapFlags flag before memory mapping the file.
|
||||
MmapFlags int
|
||||
|
||||
// InitialMmapSize is the initial mmap size of the database
|
||||
// in bytes. Read transactions won't block write transaction
|
||||
// if the InitialMmapSize is large enough to hold database mmap
|
||||
// size. (See DB.Begin for more information)
|
||||
//
|
||||
// If <=0, the initial map size is 0.
|
||||
// If initialMmapSize is smaller than the previous database size,
|
||||
// it takes no effect.
|
||||
InitialMmapSize int
|
||||
}
|
||||
|
||||
// DefaultOptions represent the options used if nil options are passed into Open().
|
||||
|
913
Godeps/_workspace/src/github.com/boltdb/bolt/db_test.go
generated
vendored
Normal file
913
Godeps/_workspace/src/github.com/boltdb/bolt/db_test.go
generated
vendored
Normal file
@ -0,0 +1,913 @@
|
||||
package bolt_test
|
||||
|
||||
import (
|
||||
"encoding/binary"
|
||||
"errors"
|
||||
"flag"
|
||||
"fmt"
|
||||
"io/ioutil"
|
||||
"os"
|
||||
"regexp"
|
||||
"runtime"
|
||||
"sort"
|
||||
"strings"
|
||||
"testing"
|
||||
"time"
|
||||
|
||||
"github.com/coreos/etcd/Godeps/_workspace/src/github.com/boltdb/bolt"
|
||||
)
|
||||
|
||||
var statsFlag = flag.Bool("stats", false, "show performance stats")
|
||||
|
||||
// Ensure that opening a database with a bad path returns an error.
|
||||
func TestOpen_BadPath(t *testing.T) {
|
||||
db, err := bolt.Open("", 0666, nil)
|
||||
assert(t, err != nil, "err: %s", err)
|
||||
assert(t, db == nil, "")
|
||||
}
|
||||
|
||||
// Ensure that a database can be opened without error.
|
||||
func TestOpen(t *testing.T) {
|
||||
path := tempfile()
|
||||
defer os.Remove(path)
|
||||
db, err := bolt.Open(path, 0666, nil)
|
||||
assert(t, db != nil, "")
|
||||
ok(t, err)
|
||||
equals(t, db.Path(), path)
|
||||
ok(t, db.Close())
|
||||
}
|
||||
|
||||
// Ensure that opening an already open database file will timeout.
|
||||
func TestOpen_Timeout(t *testing.T) {
|
||||
if runtime.GOOS == "windows" {
|
||||
t.Skip("timeout not supported on windows")
|
||||
}
|
||||
if runtime.GOOS == "solaris" {
|
||||
t.Skip("solaris fcntl locks don't support intra-process locking")
|
||||
}
|
||||
|
||||
path := tempfile()
|
||||
defer os.Remove(path)
|
||||
|
||||
// Open a data file.
|
||||
db0, err := bolt.Open(path, 0666, nil)
|
||||
assert(t, db0 != nil, "")
|
||||
ok(t, err)
|
||||
|
||||
// Attempt to open the database again.
|
||||
start := time.Now()
|
||||
db1, err := bolt.Open(path, 0666, &bolt.Options{Timeout: 100 * time.Millisecond})
|
||||
assert(t, db1 == nil, "")
|
||||
equals(t, bolt.ErrTimeout, err)
|
||||
assert(t, time.Since(start) > 100*time.Millisecond, "")
|
||||
|
||||
db0.Close()
|
||||
}
|
||||
|
||||
// Ensure that opening an already open database file will wait until its closed.
|
||||
func TestOpen_Wait(t *testing.T) {
|
||||
if runtime.GOOS == "windows" {
|
||||
t.Skip("timeout not supported on windows")
|
||||
}
|
||||
if runtime.GOOS == "solaris" {
|
||||
t.Skip("solaris fcntl locks don't support intra-process locking")
|
||||
}
|
||||
|
||||
path := tempfile()
|
||||
defer os.Remove(path)
|
||||
|
||||
// Open a data file.
|
||||
db0, err := bolt.Open(path, 0666, nil)
|
||||
assert(t, db0 != nil, "")
|
||||
ok(t, err)
|
||||
|
||||
// Close it in just a bit.
|
||||
time.AfterFunc(100*time.Millisecond, func() { db0.Close() })
|
||||
|
||||
// Attempt to open the database again.
|
||||
start := time.Now()
|
||||
db1, err := bolt.Open(path, 0666, &bolt.Options{Timeout: 200 * time.Millisecond})
|
||||
assert(t, db1 != nil, "")
|
||||
ok(t, err)
|
||||
assert(t, time.Since(start) > 100*time.Millisecond, "")
|
||||
}
|
||||
|
||||
// Ensure that opening a database does not increase its size.
|
||||
// https://github.com/boltdb/bolt/issues/291
|
||||
func TestOpen_Size(t *testing.T) {
|
||||
// Open a data file.
|
||||
db := NewTestDB()
|
||||
path := db.Path()
|
||||
defer db.Close()
|
||||
|
||||
// Insert until we get above the minimum 4MB size.
|
||||
ok(t, db.Update(func(tx *bolt.Tx) error {
|
||||
b, _ := tx.CreateBucketIfNotExists([]byte("data"))
|
||||
for i := 0; i < 10000; i++ {
|
||||
ok(t, b.Put([]byte(fmt.Sprintf("%04d", i)), make([]byte, 1000)))
|
||||
}
|
||||
return nil
|
||||
}))
|
||||
|
||||
// Close database and grab the size.
|
||||
db.DB.Close()
|
||||
sz := fileSize(path)
|
||||
if sz == 0 {
|
||||
t.Fatalf("unexpected new file size: %d", sz)
|
||||
}
|
||||
|
||||
// Reopen database, update, and check size again.
|
||||
db0, err := bolt.Open(path, 0666, nil)
|
||||
ok(t, err)
|
||||
ok(t, db0.Update(func(tx *bolt.Tx) error { return tx.Bucket([]byte("data")).Put([]byte{0}, []byte{0}) }))
|
||||
ok(t, db0.Close())
|
||||
newSz := fileSize(path)
|
||||
if newSz == 0 {
|
||||
t.Fatalf("unexpected new file size: %d", newSz)
|
||||
}
|
||||
|
||||
// Compare the original size with the new size.
|
||||
if sz != newSz {
|
||||
t.Fatalf("unexpected file growth: %d => %d", sz, newSz)
|
||||
}
|
||||
}
|
||||
|
||||
// Ensure that opening a database beyond the max step size does not increase its size.
|
||||
// https://github.com/boltdb/bolt/issues/303
|
||||
func TestOpen_Size_Large(t *testing.T) {
|
||||
if testing.Short() {
|
||||
t.Skip("short mode")
|
||||
}
|
||||
|
||||
// Open a data file.
|
||||
db := NewTestDB()
|
||||
path := db.Path()
|
||||
defer db.Close()
|
||||
|
||||
// Insert until we get above the minimum 4MB size.
|
||||
var index uint64
|
||||
for i := 0; i < 10000; i++ {
|
||||
ok(t, db.Update(func(tx *bolt.Tx) error {
|
||||
b, _ := tx.CreateBucketIfNotExists([]byte("data"))
|
||||
for j := 0; j < 1000; j++ {
|
||||
ok(t, b.Put(u64tob(index), make([]byte, 50)))
|
||||
index++
|
||||
}
|
||||
return nil
|
||||
}))
|
||||
}
|
||||
|
||||
// Close database and grab the size.
|
||||
db.DB.Close()
|
||||
sz := fileSize(path)
|
||||
if sz == 0 {
|
||||
t.Fatalf("unexpected new file size: %d", sz)
|
||||
} else if sz < (1 << 30) {
|
||||
t.Fatalf("expected larger initial size: %d", sz)
|
||||
}
|
||||
|
||||
// Reopen database, update, and check size again.
|
||||
db0, err := bolt.Open(path, 0666, nil)
|
||||
ok(t, err)
|
||||
ok(t, db0.Update(func(tx *bolt.Tx) error { return tx.Bucket([]byte("data")).Put([]byte{0}, []byte{0}) }))
|
||||
ok(t, db0.Close())
|
||||
newSz := fileSize(path)
|
||||
if newSz == 0 {
|
||||
t.Fatalf("unexpected new file size: %d", newSz)
|
||||
}
|
||||
|
||||
// Compare the original size with the new size.
|
||||
if sz != newSz {
|
||||
t.Fatalf("unexpected file growth: %d => %d", sz, newSz)
|
||||
}
|
||||
}
|
||||
|
||||
// Ensure that a re-opened database is consistent.
|
||||
func TestOpen_Check(t *testing.T) {
|
||||
path := tempfile()
|
||||
defer os.Remove(path)
|
||||
|
||||
db, err := bolt.Open(path, 0666, nil)
|
||||
ok(t, err)
|
||||
ok(t, db.View(func(tx *bolt.Tx) error { return <-tx.Check() }))
|
||||
db.Close()
|
||||
|
||||
db, err = bolt.Open(path, 0666, nil)
|
||||
ok(t, err)
|
||||
ok(t, db.View(func(tx *bolt.Tx) error { return <-tx.Check() }))
|
||||
db.Close()
|
||||
}
|
||||
|
||||
// Ensure that the database returns an error if the file handle cannot be open.
|
||||
func TestDB_Open_FileError(t *testing.T) {
|
||||
path := tempfile()
|
||||
defer os.Remove(path)
|
||||
|
||||
_, err := bolt.Open(path+"/youre-not-my-real-parent", 0666, nil)
|
||||
assert(t, err.(*os.PathError) != nil, "")
|
||||
equals(t, path+"/youre-not-my-real-parent", err.(*os.PathError).Path)
|
||||
equals(t, "open", err.(*os.PathError).Op)
|
||||
}
|
||||
|
||||
// Ensure that write errors to the meta file handler during initialization are returned.
|
||||
func TestDB_Open_MetaInitWriteError(t *testing.T) {
|
||||
t.Skip("pending")
|
||||
}
|
||||
|
||||
// Ensure that a database that is too small returns an error.
|
||||
func TestDB_Open_FileTooSmall(t *testing.T) {
|
||||
path := tempfile()
|
||||
defer os.Remove(path)
|
||||
|
||||
db, err := bolt.Open(path, 0666, nil)
|
||||
ok(t, err)
|
||||
db.Close()
|
||||
|
||||
// corrupt the database
|
||||
ok(t, os.Truncate(path, int64(os.Getpagesize())))
|
||||
|
||||
db, err = bolt.Open(path, 0666, nil)
|
||||
equals(t, errors.New("file size too small"), err)
|
||||
}
|
||||
|
||||
// Ensure that a database can be opened in read-only mode by multiple processes
|
||||
// and that a database can not be opened in read-write mode and in read-only
|
||||
// mode at the same time.
|
||||
func TestOpen_ReadOnly(t *testing.T) {
|
||||
if runtime.GOOS == "solaris" {
|
||||
t.Skip("solaris fcntl locks don't support intra-process locking")
|
||||
}
|
||||
|
||||
bucket, key, value := []byte(`bucket`), []byte(`key`), []byte(`value`)
|
||||
|
||||
path := tempfile()
|
||||
defer os.Remove(path)
|
||||
|
||||
// Open in read-write mode.
|
||||
db, err := bolt.Open(path, 0666, nil)
|
||||
ok(t, db.Update(func(tx *bolt.Tx) error {
|
||||
b, err := tx.CreateBucket(bucket)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
return b.Put(key, value)
|
||||
}))
|
||||
assert(t, db != nil, "")
|
||||
assert(t, !db.IsReadOnly(), "")
|
||||
ok(t, err)
|
||||
ok(t, db.Close())
|
||||
|
||||
// Open in read-only mode.
|
||||
db0, err := bolt.Open(path, 0666, &bolt.Options{ReadOnly: true})
|
||||
ok(t, err)
|
||||
defer db0.Close()
|
||||
|
||||
// Opening in read-write mode should return an error.
|
||||
_, err = bolt.Open(path, 0666, &bolt.Options{Timeout: time.Millisecond * 100})
|
||||
assert(t, err != nil, "")
|
||||
|
||||
// And again (in read-only mode).
|
||||
db1, err := bolt.Open(path, 0666, &bolt.Options{ReadOnly: true})
|
||||
ok(t, err)
|
||||
defer db1.Close()
|
||||
|
||||
// Verify both read-only databases are accessible.
|
||||
for _, db := range []*bolt.DB{db0, db1} {
|
||||
// Verify is is in read only mode indeed.
|
||||
assert(t, db.IsReadOnly(), "")
|
||||
|
||||
// Read-only databases should not allow updates.
|
||||
assert(t,
|
||||
bolt.ErrDatabaseReadOnly == db.Update(func(*bolt.Tx) error {
|
||||
panic(`should never get here`)
|
||||
}),
|
||||
"")
|
||||
|
||||
// Read-only databases should not allow beginning writable txns.
|
||||
_, err = db.Begin(true)
|
||||
assert(t, bolt.ErrDatabaseReadOnly == err, "")
|
||||
|
||||
// Verify the data.
|
||||
ok(t, db.View(func(tx *bolt.Tx) error {
|
||||
b := tx.Bucket(bucket)
|
||||
if b == nil {
|
||||
return fmt.Errorf("expected bucket `%s`", string(bucket))
|
||||
}
|
||||
|
||||
got := string(b.Get(key))
|
||||
expected := string(value)
|
||||
if got != expected {
|
||||
return fmt.Errorf("expected `%s`, got `%s`", expected, got)
|
||||
}
|
||||
return nil
|
||||
}))
|
||||
}
|
||||
}
|
||||
|
||||
// TODO(benbjohnson): Test corruption at every byte of the first two pages.
|
||||
|
||||
// Ensure that a database cannot open a transaction when it's not open.
|
||||
func TestDB_Begin_DatabaseNotOpen(t *testing.T) {
|
||||
var db bolt.DB
|
||||
tx, err := db.Begin(false)
|
||||
assert(t, tx == nil, "")
|
||||
equals(t, err, bolt.ErrDatabaseNotOpen)
|
||||
}
|
||||
|
||||
// Ensure that a read-write transaction can be retrieved.
|
||||
func TestDB_BeginRW(t *testing.T) {
|
||||
db := NewTestDB()
|
||||
defer db.Close()
|
||||
tx, err := db.Begin(true)
|
||||
assert(t, tx != nil, "")
|
||||
ok(t, err)
|
||||
assert(t, tx.DB() == db.DB, "")
|
||||
equals(t, tx.Writable(), true)
|
||||
ok(t, tx.Commit())
|
||||
}
|
||||
|
||||
// Ensure that opening a transaction while the DB is closed returns an error.
|
||||
func TestDB_BeginRW_Closed(t *testing.T) {
|
||||
var db bolt.DB
|
||||
tx, err := db.Begin(true)
|
||||
equals(t, err, bolt.ErrDatabaseNotOpen)
|
||||
assert(t, tx == nil, "")
|
||||
}
|
||||
|
||||
func TestDB_Close_PendingTx_RW(t *testing.T) { testDB_Close_PendingTx(t, true) }
|
||||
func TestDB_Close_PendingTx_RO(t *testing.T) { testDB_Close_PendingTx(t, false) }
|
||||
|
||||
// Ensure that a database cannot close while transactions are open.
|
||||
func testDB_Close_PendingTx(t *testing.T, writable bool) {
|
||||
db := NewTestDB()
|
||||
defer db.Close()
|
||||
|
||||
// Start transaction.
|
||||
tx, err := db.Begin(true)
|
||||
if err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
|
||||
// Open update in separate goroutine.
|
||||
done := make(chan struct{})
|
||||
go func() {
|
||||
db.Close()
|
||||
close(done)
|
||||
}()
|
||||
|
||||
// Ensure database hasn't closed.
|
||||
time.Sleep(100 * time.Millisecond)
|
||||
select {
|
||||
case <-done:
|
||||
t.Fatal("database closed too early")
|
||||
default:
|
||||
}
|
||||
|
||||
// Commit transaction.
|
||||
if err := tx.Commit(); err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
|
||||
// Ensure database closed now.
|
||||
time.Sleep(100 * time.Millisecond)
|
||||
select {
|
||||
case <-done:
|
||||
default:
|
||||
t.Fatal("database did not close")
|
||||
}
|
||||
}
|
||||
|
||||
// Ensure a database can provide a transactional block.
|
||||
func TestDB_Update(t *testing.T) {
|
||||
db := NewTestDB()
|
||||
defer db.Close()
|
||||
err := db.Update(func(tx *bolt.Tx) error {
|
||||
tx.CreateBucket([]byte("widgets"))
|
||||
b := tx.Bucket([]byte("widgets"))
|
||||
b.Put([]byte("foo"), []byte("bar"))
|
||||
b.Put([]byte("baz"), []byte("bat"))
|
||||
b.Delete([]byte("foo"))
|
||||
return nil
|
||||
})
|
||||
ok(t, err)
|
||||
err = db.View(func(tx *bolt.Tx) error {
|
||||
assert(t, tx.Bucket([]byte("widgets")).Get([]byte("foo")) == nil, "")
|
||||
equals(t, []byte("bat"), tx.Bucket([]byte("widgets")).Get([]byte("baz")))
|
||||
return nil
|
||||
})
|
||||
ok(t, err)
|
||||
}
|
||||
|
||||
// Ensure a closed database returns an error while running a transaction block
|
||||
func TestDB_Update_Closed(t *testing.T) {
|
||||
var db bolt.DB
|
||||
err := db.Update(func(tx *bolt.Tx) error {
|
||||
tx.CreateBucket([]byte("widgets"))
|
||||
return nil
|
||||
})
|
||||
equals(t, err, bolt.ErrDatabaseNotOpen)
|
||||
}
|
||||
|
||||
// Ensure a panic occurs while trying to commit a managed transaction.
|
||||
func TestDB_Update_ManualCommit(t *testing.T) {
|
||||
db := NewTestDB()
|
||||
defer db.Close()
|
||||
|
||||
var ok bool
|
||||
db.Update(func(tx *bolt.Tx) error {
|
||||
func() {
|
||||
defer func() {
|
||||
if r := recover(); r != nil {
|
||||
ok = true
|
||||
}
|
||||
}()
|
||||
tx.Commit()
|
||||
}()
|
||||
return nil
|
||||
})
|
||||
assert(t, ok, "expected panic")
|
||||
}
|
||||
|
||||
// Ensure a panic occurs while trying to rollback a managed transaction.
|
||||
func TestDB_Update_ManualRollback(t *testing.T) {
|
||||
db := NewTestDB()
|
||||
defer db.Close()
|
||||
|
||||
var ok bool
|
||||
db.Update(func(tx *bolt.Tx) error {
|
||||
func() {
|
||||
defer func() {
|
||||
if r := recover(); r != nil {
|
||||
ok = true
|
||||
}
|
||||
}()
|
||||
tx.Rollback()
|
||||
}()
|
||||
return nil
|
||||
})
|
||||
assert(t, ok, "expected panic")
|
||||
}
|
||||
|
||||
// Ensure a panic occurs while trying to commit a managed transaction.
|
||||
func TestDB_View_ManualCommit(t *testing.T) {
|
||||
db := NewTestDB()
|
||||
defer db.Close()
|
||||
|
||||
var ok bool
|
||||
db.Update(func(tx *bolt.Tx) error {
|
||||
func() {
|
||||
defer func() {
|
||||
if r := recover(); r != nil {
|
||||
ok = true
|
||||
}
|
||||
}()
|
||||
tx.Commit()
|
||||
}()
|
||||
return nil
|
||||
})
|
||||
assert(t, ok, "expected panic")
|
||||
}
|
||||
|
||||
// Ensure a panic occurs while trying to rollback a managed transaction.
|
||||
func TestDB_View_ManualRollback(t *testing.T) {
|
||||
db := NewTestDB()
|
||||
defer db.Close()
|
||||
|
||||
var ok bool
|
||||
db.Update(func(tx *bolt.Tx) error {
|
||||
func() {
|
||||
defer func() {
|
||||
if r := recover(); r != nil {
|
||||
ok = true
|
||||
}
|
||||
}()
|
||||
tx.Rollback()
|
||||
}()
|
||||
return nil
|
||||
})
|
||||
assert(t, ok, "expected panic")
|
||||
}
|
||||
|
||||
// Ensure a write transaction that panics does not hold open locks.
|
||||
func TestDB_Update_Panic(t *testing.T) {
|
||||
db := NewTestDB()
|
||||
defer db.Close()
|
||||
|
||||
func() {
|
||||
defer func() {
|
||||
if r := recover(); r != nil {
|
||||
t.Log("recover: update", r)
|
||||
}
|
||||
}()
|
||||
db.Update(func(tx *bolt.Tx) error {
|
||||
tx.CreateBucket([]byte("widgets"))
|
||||
panic("omg")
|
||||
})
|
||||
}()
|
||||
|
||||
// Verify we can update again.
|
||||
err := db.Update(func(tx *bolt.Tx) error {
|
||||
_, err := tx.CreateBucket([]byte("widgets"))
|
||||
return err
|
||||
})
|
||||
ok(t, err)
|
||||
|
||||
// Verify that our change persisted.
|
||||
err = db.Update(func(tx *bolt.Tx) error {
|
||||
assert(t, tx.Bucket([]byte("widgets")) != nil, "")
|
||||
return nil
|
||||
})
|
||||
}
|
||||
|
||||
// Ensure a database can return an error through a read-only transactional block.
|
||||
func TestDB_View_Error(t *testing.T) {
|
||||
db := NewTestDB()
|
||||
defer db.Close()
|
||||
err := db.View(func(tx *bolt.Tx) error {
|
||||
return errors.New("xxx")
|
||||
})
|
||||
equals(t, errors.New("xxx"), err)
|
||||
}
|
||||
|
||||
// Ensure a read transaction that panics does not hold open locks.
|
||||
func TestDB_View_Panic(t *testing.T) {
|
||||
db := NewTestDB()
|
||||
defer db.Close()
|
||||
db.Update(func(tx *bolt.Tx) error {
|
||||
tx.CreateBucket([]byte("widgets"))
|
||||
return nil
|
||||
})
|
||||
|
||||
func() {
|
||||
defer func() {
|
||||
if r := recover(); r != nil {
|
||||
t.Log("recover: view", r)
|
||||
}
|
||||
}()
|
||||
db.View(func(tx *bolt.Tx) error {
|
||||
assert(t, tx.Bucket([]byte("widgets")) != nil, "")
|
||||
panic("omg")
|
||||
})
|
||||
}()
|
||||
|
||||
// Verify that we can still use read transactions.
|
||||
db.View(func(tx *bolt.Tx) error {
|
||||
assert(t, tx.Bucket([]byte("widgets")) != nil, "")
|
||||
return nil
|
||||
})
|
||||
}
|
||||
|
||||
// Ensure that an error is returned when a database write fails.
|
||||
func TestDB_Commit_WriteFail(t *testing.T) {
|
||||
t.Skip("pending") // TODO(benbjohnson)
|
||||
}
|
||||
|
||||
// Ensure that DB stats can be returned.
|
||||
func TestDB_Stats(t *testing.T) {
|
||||
db := NewTestDB()
|
||||
defer db.Close()
|
||||
db.Update(func(tx *bolt.Tx) error {
|
||||
_, err := tx.CreateBucket([]byte("widgets"))
|
||||
return err
|
||||
})
|
||||
stats := db.Stats()
|
||||
equals(t, 2, stats.TxStats.PageCount)
|
||||
equals(t, 0, stats.FreePageN)
|
||||
equals(t, 2, stats.PendingPageN)
|
||||
}
|
||||
|
||||
// Ensure that database pages are in expected order and type.
|
||||
func TestDB_Consistency(t *testing.T) {
|
||||
db := NewTestDB()
|
||||
defer db.Close()
|
||||
db.Update(func(tx *bolt.Tx) error {
|
||||
_, err := tx.CreateBucket([]byte("widgets"))
|
||||
return err
|
||||
})
|
||||
|
||||
for i := 0; i < 10; i++ {
|
||||
db.Update(func(tx *bolt.Tx) error {
|
||||
ok(t, tx.Bucket([]byte("widgets")).Put([]byte("foo"), []byte("bar")))
|
||||
return nil
|
||||
})
|
||||
}
|
||||
db.Update(func(tx *bolt.Tx) error {
|
||||
p, _ := tx.Page(0)
|
||||
assert(t, p != nil, "")
|
||||
equals(t, "meta", p.Type)
|
||||
|
||||
p, _ = tx.Page(1)
|
||||
assert(t, p != nil, "")
|
||||
equals(t, "meta", p.Type)
|
||||
|
||||
p, _ = tx.Page(2)
|
||||
assert(t, p != nil, "")
|
||||
equals(t, "free", p.Type)
|
||||
|
||||
p, _ = tx.Page(3)
|
||||
assert(t, p != nil, "")
|
||||
equals(t, "free", p.Type)
|
||||
|
||||
p, _ = tx.Page(4)
|
||||
assert(t, p != nil, "")
|
||||
equals(t, "leaf", p.Type)
|
||||
|
||||
p, _ = tx.Page(5)
|
||||
assert(t, p != nil, "")
|
||||
equals(t, "freelist", p.Type)
|
||||
|
||||
p, _ = tx.Page(6)
|
||||
assert(t, p == nil, "")
|
||||
return nil
|
||||
})
|
||||
}
|
||||
|
||||
// Ensure that DB stats can be substracted from one another.
|
||||
func TestDBStats_Sub(t *testing.T) {
|
||||
var a, b bolt.Stats
|
||||
a.TxStats.PageCount = 3
|
||||
a.FreePageN = 4
|
||||
b.TxStats.PageCount = 10
|
||||
b.FreePageN = 14
|
||||
diff := b.Sub(&a)
|
||||
equals(t, 7, diff.TxStats.PageCount)
|
||||
// free page stats are copied from the receiver and not subtracted
|
||||
equals(t, 14, diff.FreePageN)
|
||||
}
|
||||
|
||||
func ExampleDB_Update() {
|
||||
// Open the database.
|
||||
db, _ := bolt.Open(tempfile(), 0666, nil)
|
||||
defer os.Remove(db.Path())
|
||||
defer db.Close()
|
||||
|
||||
// Execute several commands within a write transaction.
|
||||
err := db.Update(func(tx *bolt.Tx) error {
|
||||
b, err := tx.CreateBucket([]byte("widgets"))
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
if err := b.Put([]byte("foo"), []byte("bar")); err != nil {
|
||||
return err
|
||||
}
|
||||
return nil
|
||||
})
|
||||
|
||||
// If our transactional block didn't return an error then our data is saved.
|
||||
if err == nil {
|
||||
db.View(func(tx *bolt.Tx) error {
|
||||
value := tx.Bucket([]byte("widgets")).Get([]byte("foo"))
|
||||
fmt.Printf("The value of 'foo' is: %s\n", value)
|
||||
return nil
|
||||
})
|
||||
}
|
||||
|
||||
// Output:
|
||||
// The value of 'foo' is: bar
|
||||
}
|
||||
|
||||
func ExampleDB_View() {
|
||||
// Open the database.
|
||||
db, _ := bolt.Open(tempfile(), 0666, nil)
|
||||
defer os.Remove(db.Path())
|
||||
defer db.Close()
|
||||
|
||||
// Insert data into a bucket.
|
||||
db.Update(func(tx *bolt.Tx) error {
|
||||
tx.CreateBucket([]byte("people"))
|
||||
b := tx.Bucket([]byte("people"))
|
||||
b.Put([]byte("john"), []byte("doe"))
|
||||
b.Put([]byte("susy"), []byte("que"))
|
||||
return nil
|
||||
})
|
||||
|
||||
// Access data from within a read-only transactional block.
|
||||
db.View(func(tx *bolt.Tx) error {
|
||||
v := tx.Bucket([]byte("people")).Get([]byte("john"))
|
||||
fmt.Printf("John's last name is %s.\n", v)
|
||||
return nil
|
||||
})
|
||||
|
||||
// Output:
|
||||
// John's last name is doe.
|
||||
}
|
||||
|
||||
func ExampleDB_Begin_ReadOnly() {
|
||||
// Open the database.
|
||||
db, _ := bolt.Open(tempfile(), 0666, nil)
|
||||
defer os.Remove(db.Path())
|
||||
defer db.Close()
|
||||
|
||||
// Create a bucket.
|
||||
db.Update(func(tx *bolt.Tx) error {
|
||||
_, err := tx.CreateBucket([]byte("widgets"))
|
||||
return err
|
||||
})
|
||||
|
||||
// Create several keys in a transaction.
|
||||
tx, _ := db.Begin(true)
|
||||
b := tx.Bucket([]byte("widgets"))
|
||||
b.Put([]byte("john"), []byte("blue"))
|
||||
b.Put([]byte("abby"), []byte("red"))
|
||||
b.Put([]byte("zephyr"), []byte("purple"))
|
||||
tx.Commit()
|
||||
|
||||
// Iterate over the values in sorted key order.
|
||||
tx, _ = db.Begin(false)
|
||||
c := tx.Bucket([]byte("widgets")).Cursor()
|
||||
for k, v := c.First(); k != nil; k, v = c.Next() {
|
||||
fmt.Printf("%s likes %s\n", k, v)
|
||||
}
|
||||
tx.Rollback()
|
||||
|
||||
// Output:
|
||||
// abby likes red
|
||||
// john likes blue
|
||||
// zephyr likes purple
|
||||
}
|
||||
|
||||
// TestDB represents a wrapper around a Bolt DB to handle temporary file
|
||||
// creation and automatic cleanup on close.
|
||||
type TestDB struct {
|
||||
*bolt.DB
|
||||
}
|
||||
|
||||
// NewTestDB returns a new instance of TestDB.
|
||||
func NewTestDB() *TestDB {
|
||||
db, err := bolt.Open(tempfile(), 0666, nil)
|
||||
if err != nil {
|
||||
panic("cannot open db: " + err.Error())
|
||||
}
|
||||
return &TestDB{db}
|
||||
}
|
||||
|
||||
// MustView executes a read-only function. Panic on error.
|
||||
func (db *TestDB) MustView(fn func(tx *bolt.Tx) error) {
|
||||
if err := db.DB.View(func(tx *bolt.Tx) error {
|
||||
return fn(tx)
|
||||
}); err != nil {
|
||||
panic(err.Error())
|
||||
}
|
||||
}
|
||||
|
||||
// MustUpdate executes a read-write function. Panic on error.
|
||||
func (db *TestDB) MustUpdate(fn func(tx *bolt.Tx) error) {
|
||||
if err := db.DB.View(func(tx *bolt.Tx) error {
|
||||
return fn(tx)
|
||||
}); err != nil {
|
||||
panic(err.Error())
|
||||
}
|
||||
}
|
||||
|
||||
// MustCreateBucket creates a new bucket. Panic on error.
|
||||
func (db *TestDB) MustCreateBucket(name []byte) {
|
||||
if err := db.Update(func(tx *bolt.Tx) error {
|
||||
_, err := tx.CreateBucket([]byte(name))
|
||||
return err
|
||||
}); err != nil {
|
||||
panic(err.Error())
|
||||
}
|
||||
}
|
||||
|
||||
// Close closes the database and deletes the underlying file.
|
||||
func (db *TestDB) Close() {
|
||||
// Log statistics.
|
||||
if *statsFlag {
|
||||
db.PrintStats()
|
||||
}
|
||||
|
||||
// Check database consistency after every test.
|
||||
db.MustCheck()
|
||||
|
||||
// Close database and remove file.
|
||||
defer os.Remove(db.Path())
|
||||
db.DB.Close()
|
||||
}
|
||||
|
||||
// PrintStats prints the database stats
|
||||
func (db *TestDB) PrintStats() {
|
||||
var stats = db.Stats()
|
||||
fmt.Printf("[db] %-20s %-20s %-20s\n",
|
||||
fmt.Sprintf("pg(%d/%d)", stats.TxStats.PageCount, stats.TxStats.PageAlloc),
|
||||
fmt.Sprintf("cur(%d)", stats.TxStats.CursorCount),
|
||||
fmt.Sprintf("node(%d/%d)", stats.TxStats.NodeCount, stats.TxStats.NodeDeref),
|
||||
)
|
||||
fmt.Printf(" %-20s %-20s %-20s\n",
|
||||
fmt.Sprintf("rebal(%d/%v)", stats.TxStats.Rebalance, truncDuration(stats.TxStats.RebalanceTime)),
|
||||
fmt.Sprintf("spill(%d/%v)", stats.TxStats.Spill, truncDuration(stats.TxStats.SpillTime)),
|
||||
fmt.Sprintf("w(%d/%v)", stats.TxStats.Write, truncDuration(stats.TxStats.WriteTime)),
|
||||
)
|
||||
}
|
||||
|
||||
// MustCheck runs a consistency check on the database and panics if any errors are found.
|
||||
func (db *TestDB) MustCheck() {
|
||||
db.Update(func(tx *bolt.Tx) error {
|
||||
// Collect all the errors.
|
||||
var errors []error
|
||||
for err := range tx.Check() {
|
||||
errors = append(errors, err)
|
||||
if len(errors) > 10 {
|
||||
break
|
||||
}
|
||||
}
|
||||
|
||||
// If errors occurred, copy the DB and print the errors.
|
||||
if len(errors) > 0 {
|
||||
var path = tempfile()
|
||||
tx.CopyFile(path, 0600)
|
||||
|
||||
// Print errors.
|
||||
fmt.Print("\n\n")
|
||||
fmt.Printf("consistency check failed (%d errors)\n", len(errors))
|
||||
for _, err := range errors {
|
||||
fmt.Println(err)
|
||||
}
|
||||
fmt.Println("")
|
||||
fmt.Println("db saved to:")
|
||||
fmt.Println(path)
|
||||
fmt.Print("\n\n")
|
||||
os.Exit(-1)
|
||||
}
|
||||
|
||||
return nil
|
||||
})
|
||||
}
|
||||
|
||||
// CopyTempFile copies a database to a temporary file.
|
||||
func (db *TestDB) CopyTempFile() {
|
||||
path := tempfile()
|
||||
db.View(func(tx *bolt.Tx) error { return tx.CopyFile(path, 0600) })
|
||||
fmt.Println("db copied to: ", path)
|
||||
}
|
||||
|
||||
// tempfile returns a temporary file path.
|
||||
func tempfile() string {
|
||||
f, _ := ioutil.TempFile("", "bolt-")
|
||||
f.Close()
|
||||
os.Remove(f.Name())
|
||||
return f.Name()
|
||||
}
|
||||
|
||||
// mustContainKeys checks that a bucket contains a given set of keys.
|
||||
func mustContainKeys(b *bolt.Bucket, m map[string]string) {
|
||||
found := make(map[string]string)
|
||||
b.ForEach(func(k, _ []byte) error {
|
||||
found[string(k)] = ""
|
||||
return nil
|
||||
})
|
||||
|
||||
// Check for keys found in bucket that shouldn't be there.
|
||||
var keys []string
|
||||
for k, _ := range found {
|
||||
if _, ok := m[string(k)]; !ok {
|
||||
keys = append(keys, k)
|
||||
}
|
||||
}
|
||||
if len(keys) > 0 {
|
||||
sort.Strings(keys)
|
||||
panic(fmt.Sprintf("keys found(%d): %s", len(keys), strings.Join(keys, ",")))
|
||||
}
|
||||
|
||||
// Check for keys not found in bucket that should be there.
|
||||
for k, _ := range m {
|
||||
if _, ok := found[string(k)]; !ok {
|
||||
keys = append(keys, k)
|
||||
}
|
||||
}
|
||||
if len(keys) > 0 {
|
||||
sort.Strings(keys)
|
||||
panic(fmt.Sprintf("keys not found(%d): %s", len(keys), strings.Join(keys, ",")))
|
||||
}
|
||||
}
|
||||
|
||||
func trunc(b []byte, length int) []byte {
|
||||
if length < len(b) {
|
||||
return b[:length]
|
||||
}
|
||||
return b
|
||||
}
|
||||
|
||||
func truncDuration(d time.Duration) string {
|
||||
return regexp.MustCompile(`^(\d+)(\.\d+)`).ReplaceAllString(d.String(), "$1")
|
||||
}
|
||||
|
||||
func fileSize(path string) int64 {
|
||||
fi, err := os.Stat(path)
|
||||
if err != nil {
|
||||
return 0
|
||||
}
|
||||
return fi.Size()
|
||||
}
|
||||
|
||||
func warn(v ...interface{}) { fmt.Fprintln(os.Stderr, v...) }
|
||||
func warnf(msg string, v ...interface{}) { fmt.Fprintf(os.Stderr, msg+"\n", v...) }
|
||||
|
||||
// u64tob converts a uint64 into an 8-byte slice.
|
||||
func u64tob(v uint64) []byte {
|
||||
b := make([]byte, 8)
|
||||
binary.BigEndian.PutUint64(b, v)
|
||||
return b
|
||||
}
|
||||
|
||||
// btou64 converts an 8-byte slice into an uint64.
|
||||
func btou64(b []byte) uint64 { return binary.BigEndian.Uint64(b) }
|
156
Godeps/_workspace/src/github.com/boltdb/bolt/freelist_test.go
generated
vendored
Normal file
156
Godeps/_workspace/src/github.com/boltdb/bolt/freelist_test.go
generated
vendored
Normal file
@ -0,0 +1,156 @@
|
||||
package bolt
|
||||
|
||||
import (
|
||||
"math/rand"
|
||||
"reflect"
|
||||
"sort"
|
||||
"testing"
|
||||
"unsafe"
|
||||
)
|
||||
|
||||
// Ensure that a page is added to a transaction's freelist.
|
||||
func TestFreelist_free(t *testing.T) {
|
||||
f := newFreelist()
|
||||
f.free(100, &page{id: 12})
|
||||
if !reflect.DeepEqual([]pgid{12}, f.pending[100]) {
|
||||
t.Fatalf("exp=%v; got=%v", []pgid{12}, f.pending[100])
|
||||
}
|
||||
}
|
||||
|
||||
// Ensure that a page and its overflow is added to a transaction's freelist.
|
||||
func TestFreelist_free_overflow(t *testing.T) {
|
||||
f := newFreelist()
|
||||
f.free(100, &page{id: 12, overflow: 3})
|
||||
if exp := []pgid{12, 13, 14, 15}; !reflect.DeepEqual(exp, f.pending[100]) {
|
||||
t.Fatalf("exp=%v; got=%v", exp, f.pending[100])
|
||||
}
|
||||
}
|
||||
|
||||
// Ensure that a transaction's free pages can be released.
|
||||
func TestFreelist_release(t *testing.T) {
|
||||
f := newFreelist()
|
||||
f.free(100, &page{id: 12, overflow: 1})
|
||||
f.free(100, &page{id: 9})
|
||||
f.free(102, &page{id: 39})
|
||||
f.release(100)
|
||||
f.release(101)
|
||||
if exp := []pgid{9, 12, 13}; !reflect.DeepEqual(exp, f.ids) {
|
||||
t.Fatalf("exp=%v; got=%v", exp, f.ids)
|
||||
}
|
||||
|
||||
f.release(102)
|
||||
if exp := []pgid{9, 12, 13, 39}; !reflect.DeepEqual(exp, f.ids) {
|
||||
t.Fatalf("exp=%v; got=%v", exp, f.ids)
|
||||
}
|
||||
}
|
||||
|
||||
// Ensure that a freelist can find contiguous blocks of pages.
|
||||
func TestFreelist_allocate(t *testing.T) {
|
||||
f := &freelist{ids: []pgid{3, 4, 5, 6, 7, 9, 12, 13, 18}}
|
||||
if id := int(f.allocate(3)); id != 3 {
|
||||
t.Fatalf("exp=3; got=%v", id)
|
||||
}
|
||||
if id := int(f.allocate(1)); id != 6 {
|
||||
t.Fatalf("exp=6; got=%v", id)
|
||||
}
|
||||
if id := int(f.allocate(3)); id != 0 {
|
||||
t.Fatalf("exp=0; got=%v", id)
|
||||
}
|
||||
if id := int(f.allocate(2)); id != 12 {
|
||||
t.Fatalf("exp=12; got=%v", id)
|
||||
}
|
||||
if id := int(f.allocate(1)); id != 7 {
|
||||
t.Fatalf("exp=7; got=%v", id)
|
||||
}
|
||||
if id := int(f.allocate(0)); id != 0 {
|
||||
t.Fatalf("exp=0; got=%v", id)
|
||||
}
|
||||
if id := int(f.allocate(0)); id != 0 {
|
||||
t.Fatalf("exp=0; got=%v", id)
|
||||
}
|
||||
if exp := []pgid{9, 18}; !reflect.DeepEqual(exp, f.ids) {
|
||||
t.Fatalf("exp=%v; got=%v", exp, f.ids)
|
||||
}
|
||||
|
||||
if id := int(f.allocate(1)); id != 9 {
|
||||
t.Fatalf("exp=9; got=%v", id)
|
||||
}
|
||||
if id := int(f.allocate(1)); id != 18 {
|
||||
t.Fatalf("exp=18; got=%v", id)
|
||||
}
|
||||
if id := int(f.allocate(1)); id != 0 {
|
||||
t.Fatalf("exp=0; got=%v", id)
|
||||
}
|
||||
if exp := []pgid{}; !reflect.DeepEqual(exp, f.ids) {
|
||||
t.Fatalf("exp=%v; got=%v", exp, f.ids)
|
||||
}
|
||||
}
|
||||
|
||||
// Ensure that a freelist can deserialize from a freelist page.
|
||||
func TestFreelist_read(t *testing.T) {
|
||||
// Create a page.
|
||||
var buf [4096]byte
|
||||
page := (*page)(unsafe.Pointer(&buf[0]))
|
||||
page.flags = freelistPageFlag
|
||||
page.count = 2
|
||||
|
||||
// Insert 2 page ids.
|
||||
ids := (*[3]pgid)(unsafe.Pointer(&page.ptr))
|
||||
ids[0] = 23
|
||||
ids[1] = 50
|
||||
|
||||
// Deserialize page into a freelist.
|
||||
f := newFreelist()
|
||||
f.read(page)
|
||||
|
||||
// Ensure that there are two page ids in the freelist.
|
||||
if exp := []pgid{23, 50}; !reflect.DeepEqual(exp, f.ids) {
|
||||
t.Fatalf("exp=%v; got=%v", exp, f.ids)
|
||||
}
|
||||
}
|
||||
|
||||
// Ensure that a freelist can serialize into a freelist page.
|
||||
func TestFreelist_write(t *testing.T) {
|
||||
// Create a freelist and write it to a page.
|
||||
var buf [4096]byte
|
||||
f := &freelist{ids: []pgid{12, 39}, pending: make(map[txid][]pgid)}
|
||||
f.pending[100] = []pgid{28, 11}
|
||||
f.pending[101] = []pgid{3}
|
||||
p := (*page)(unsafe.Pointer(&buf[0]))
|
||||
f.write(p)
|
||||
|
||||
// Read the page back out.
|
||||
f2 := newFreelist()
|
||||
f2.read(p)
|
||||
|
||||
// Ensure that the freelist is correct.
|
||||
// All pages should be present and in reverse order.
|
||||
if exp := []pgid{3, 11, 12, 28, 39}; !reflect.DeepEqual(exp, f2.ids) {
|
||||
t.Fatalf("exp=%v; got=%v", exp, f2.ids)
|
||||
}
|
||||
}
|
||||
|
||||
func Benchmark_FreelistRelease10K(b *testing.B) { benchmark_FreelistRelease(b, 10000) }
|
||||
func Benchmark_FreelistRelease100K(b *testing.B) { benchmark_FreelistRelease(b, 100000) }
|
||||
func Benchmark_FreelistRelease1000K(b *testing.B) { benchmark_FreelistRelease(b, 1000000) }
|
||||
func Benchmark_FreelistRelease10000K(b *testing.B) { benchmark_FreelistRelease(b, 10000000) }
|
||||
|
||||
func benchmark_FreelistRelease(b *testing.B, size int) {
|
||||
ids := randomPgids(size)
|
||||
pending := randomPgids(len(ids) / 400)
|
||||
b.ResetTimer()
|
||||
for i := 0; i < b.N; i++ {
|
||||
f := &freelist{ids: ids, pending: map[txid][]pgid{1: pending}}
|
||||
f.release(1)
|
||||
}
|
||||
}
|
||||
|
||||
func randomPgids(n int) []pgid {
|
||||
rand.Seed(42)
|
||||
pgids := make(pgids, n)
|
||||
for i := range pgids {
|
||||
pgids[i] = pgid(rand.Int63())
|
||||
}
|
||||
sort.Sort(pgids)
|
||||
return pgids
|
||||
}
|
156
Godeps/_workspace/src/github.com/boltdb/bolt/node_test.go
generated
vendored
Normal file
156
Godeps/_workspace/src/github.com/boltdb/bolt/node_test.go
generated
vendored
Normal file
@ -0,0 +1,156 @@
|
||||
package bolt
|
||||
|
||||
import (
|
||||
"testing"
|
||||
"unsafe"
|
||||
)
|
||||
|
||||
// Ensure that a node can insert a key/value.
|
||||
func TestNode_put(t *testing.T) {
|
||||
n := &node{inodes: make(inodes, 0), bucket: &Bucket{tx: &Tx{meta: &meta{pgid: 1}}}}
|
||||
n.put([]byte("baz"), []byte("baz"), []byte("2"), 0, 0)
|
||||
n.put([]byte("foo"), []byte("foo"), []byte("0"), 0, 0)
|
||||
n.put([]byte("bar"), []byte("bar"), []byte("1"), 0, 0)
|
||||
n.put([]byte("foo"), []byte("foo"), []byte("3"), 0, leafPageFlag)
|
||||
|
||||
if len(n.inodes) != 3 {
|
||||
t.Fatalf("exp=3; got=%d", len(n.inodes))
|
||||
}
|
||||
if k, v := n.inodes[0].key, n.inodes[0].value; string(k) != "bar" || string(v) != "1" {
|
||||
t.Fatalf("exp=<bar,1>; got=<%s,%s>", k, v)
|
||||
}
|
||||
if k, v := n.inodes[1].key, n.inodes[1].value; string(k) != "baz" || string(v) != "2" {
|
||||
t.Fatalf("exp=<baz,2>; got=<%s,%s>", k, v)
|
||||
}
|
||||
if k, v := n.inodes[2].key, n.inodes[2].value; string(k) != "foo" || string(v) != "3" {
|
||||
t.Fatalf("exp=<foo,3>; got=<%s,%s>", k, v)
|
||||
}
|
||||
if n.inodes[2].flags != uint32(leafPageFlag) {
|
||||
t.Fatalf("not a leaf: %d", n.inodes[2].flags)
|
||||
}
|
||||
}
|
||||
|
||||
// Ensure that a node can deserialize from a leaf page.
|
||||
func TestNode_read_LeafPage(t *testing.T) {
|
||||
// Create a page.
|
||||
var buf [4096]byte
|
||||
page := (*page)(unsafe.Pointer(&buf[0]))
|
||||
page.flags = leafPageFlag
|
||||
page.count = 2
|
||||
|
||||
// Insert 2 elements at the beginning. sizeof(leafPageElement) == 16
|
||||
nodes := (*[3]leafPageElement)(unsafe.Pointer(&page.ptr))
|
||||
nodes[0] = leafPageElement{flags: 0, pos: 32, ksize: 3, vsize: 4} // pos = sizeof(leafPageElement) * 2
|
||||
nodes[1] = leafPageElement{flags: 0, pos: 23, ksize: 10, vsize: 3} // pos = sizeof(leafPageElement) + 3 + 4
|
||||
|
||||
// Write data for the nodes at the end.
|
||||
data := (*[4096]byte)(unsafe.Pointer(&nodes[2]))
|
||||
copy(data[:], []byte("barfooz"))
|
||||
copy(data[7:], []byte("helloworldbye"))
|
||||
|
||||
// Deserialize page into a leaf.
|
||||
n := &node{}
|
||||
n.read(page)
|
||||
|
||||
// Check that there are two inodes with correct data.
|
||||
if !n.isLeaf {
|
||||
t.Fatal("expected leaf")
|
||||
}
|
||||
if len(n.inodes) != 2 {
|
||||
t.Fatalf("exp=2; got=%d", len(n.inodes))
|
||||
}
|
||||
if k, v := n.inodes[0].key, n.inodes[0].value; string(k) != "bar" || string(v) != "fooz" {
|
||||
t.Fatalf("exp=<bar,fooz>; got=<%s,%s>", k, v)
|
||||
}
|
||||
if k, v := n.inodes[1].key, n.inodes[1].value; string(k) != "helloworld" || string(v) != "bye" {
|
||||
t.Fatalf("exp=<helloworld,bye>; got=<%s,%s>", k, v)
|
||||
}
|
||||
}
|
||||
|
||||
// Ensure that a node can serialize into a leaf page.
|
||||
func TestNode_write_LeafPage(t *testing.T) {
|
||||
// Create a node.
|
||||
n := &node{isLeaf: true, inodes: make(inodes, 0), bucket: &Bucket{tx: &Tx{db: &DB{}, meta: &meta{pgid: 1}}}}
|
||||
n.put([]byte("susy"), []byte("susy"), []byte("que"), 0, 0)
|
||||
n.put([]byte("ricki"), []byte("ricki"), []byte("lake"), 0, 0)
|
||||
n.put([]byte("john"), []byte("john"), []byte("johnson"), 0, 0)
|
||||
|
||||
// Write it to a page.
|
||||
var buf [4096]byte
|
||||
p := (*page)(unsafe.Pointer(&buf[0]))
|
||||
n.write(p)
|
||||
|
||||
// Read the page back in.
|
||||
n2 := &node{}
|
||||
n2.read(p)
|
||||
|
||||
// Check that the two pages are the same.
|
||||
if len(n2.inodes) != 3 {
|
||||
t.Fatalf("exp=3; got=%d", len(n2.inodes))
|
||||
}
|
||||
if k, v := n2.inodes[0].key, n2.inodes[0].value; string(k) != "john" || string(v) != "johnson" {
|
||||
t.Fatalf("exp=<john,johnson>; got=<%s,%s>", k, v)
|
||||
}
|
||||
if k, v := n2.inodes[1].key, n2.inodes[1].value; string(k) != "ricki" || string(v) != "lake" {
|
||||
t.Fatalf("exp=<ricki,lake>; got=<%s,%s>", k, v)
|
||||
}
|
||||
if k, v := n2.inodes[2].key, n2.inodes[2].value; string(k) != "susy" || string(v) != "que" {
|
||||
t.Fatalf("exp=<susy,que>; got=<%s,%s>", k, v)
|
||||
}
|
||||
}
|
||||
|
||||
// Ensure that a node can split into appropriate subgroups.
|
||||
func TestNode_split(t *testing.T) {
|
||||
// Create a node.
|
||||
n := &node{inodes: make(inodes, 0), bucket: &Bucket{tx: &Tx{db: &DB{}, meta: &meta{pgid: 1}}}}
|
||||
n.put([]byte("00000001"), []byte("00000001"), []byte("0123456701234567"), 0, 0)
|
||||
n.put([]byte("00000002"), []byte("00000002"), []byte("0123456701234567"), 0, 0)
|
||||
n.put([]byte("00000003"), []byte("00000003"), []byte("0123456701234567"), 0, 0)
|
||||
n.put([]byte("00000004"), []byte("00000004"), []byte("0123456701234567"), 0, 0)
|
||||
n.put([]byte("00000005"), []byte("00000005"), []byte("0123456701234567"), 0, 0)
|
||||
|
||||
// Split between 2 & 3.
|
||||
n.split(100)
|
||||
|
||||
var parent = n.parent
|
||||
if len(parent.children) != 2 {
|
||||
t.Fatalf("exp=2; got=%d", len(parent.children))
|
||||
}
|
||||
if len(parent.children[0].inodes) != 2 {
|
||||
t.Fatalf("exp=2; got=%d", len(parent.children[0].inodes))
|
||||
}
|
||||
if len(parent.children[1].inodes) != 3 {
|
||||
t.Fatalf("exp=3; got=%d", len(parent.children[1].inodes))
|
||||
}
|
||||
}
|
||||
|
||||
// Ensure that a page with the minimum number of inodes just returns a single node.
|
||||
func TestNode_split_MinKeys(t *testing.T) {
|
||||
// Create a node.
|
||||
n := &node{inodes: make(inodes, 0), bucket: &Bucket{tx: &Tx{db: &DB{}, meta: &meta{pgid: 1}}}}
|
||||
n.put([]byte("00000001"), []byte("00000001"), []byte("0123456701234567"), 0, 0)
|
||||
n.put([]byte("00000002"), []byte("00000002"), []byte("0123456701234567"), 0, 0)
|
||||
|
||||
// Split.
|
||||
n.split(20)
|
||||
if n.parent != nil {
|
||||
t.Fatalf("expected nil parent")
|
||||
}
|
||||
}
|
||||
|
||||
// Ensure that a node that has keys that all fit on a page just returns one leaf.
|
||||
func TestNode_split_SinglePage(t *testing.T) {
|
||||
// Create a node.
|
||||
n := &node{inodes: make(inodes, 0), bucket: &Bucket{tx: &Tx{db: &DB{}, meta: &meta{pgid: 1}}}}
|
||||
n.put([]byte("00000001"), []byte("00000001"), []byte("0123456701234567"), 0, 0)
|
||||
n.put([]byte("00000002"), []byte("00000002"), []byte("0123456701234567"), 0, 0)
|
||||
n.put([]byte("00000003"), []byte("00000003"), []byte("0123456701234567"), 0, 0)
|
||||
n.put([]byte("00000004"), []byte("00000004"), []byte("0123456701234567"), 0, 0)
|
||||
n.put([]byte("00000005"), []byte("00000005"), []byte("0123456701234567"), 0, 0)
|
||||
|
||||
// Split.
|
||||
n.split(4096)
|
||||
if n.parent != nil {
|
||||
t.Fatalf("expected nil parent")
|
||||
}
|
||||
}
|
72
Godeps/_workspace/src/github.com/boltdb/bolt/page_test.go
generated
vendored
Normal file
72
Godeps/_workspace/src/github.com/boltdb/bolt/page_test.go
generated
vendored
Normal file
@ -0,0 +1,72 @@
|
||||
package bolt
|
||||
|
||||
import (
|
||||
"reflect"
|
||||
"sort"
|
||||
"testing"
|
||||
"testing/quick"
|
||||
)
|
||||
|
||||
// Ensure that the page type can be returned in human readable format.
|
||||
func TestPage_typ(t *testing.T) {
|
||||
if typ := (&page{flags: branchPageFlag}).typ(); typ != "branch" {
|
||||
t.Fatalf("exp=branch; got=%v", typ)
|
||||
}
|
||||
if typ := (&page{flags: leafPageFlag}).typ(); typ != "leaf" {
|
||||
t.Fatalf("exp=leaf; got=%v", typ)
|
||||
}
|
||||
if typ := (&page{flags: metaPageFlag}).typ(); typ != "meta" {
|
||||
t.Fatalf("exp=meta; got=%v", typ)
|
||||
}
|
||||
if typ := (&page{flags: freelistPageFlag}).typ(); typ != "freelist" {
|
||||
t.Fatalf("exp=freelist; got=%v", typ)
|
||||
}
|
||||
if typ := (&page{flags: 20000}).typ(); typ != "unknown<4e20>" {
|
||||
t.Fatalf("exp=unknown<4e20>; got=%v", typ)
|
||||
}
|
||||
}
|
||||
|
||||
// Ensure that the hexdump debugging function doesn't blow up.
|
||||
func TestPage_dump(t *testing.T) {
|
||||
(&page{id: 256}).hexdump(16)
|
||||
}
|
||||
|
||||
func TestPgids_merge(t *testing.T) {
|
||||
a := pgids{4, 5, 6, 10, 11, 12, 13, 27}
|
||||
b := pgids{1, 3, 8, 9, 25, 30}
|
||||
c := a.merge(b)
|
||||
if !reflect.DeepEqual(c, pgids{1, 3, 4, 5, 6, 8, 9, 10, 11, 12, 13, 25, 27, 30}) {
|
||||
t.Errorf("mismatch: %v", c)
|
||||
}
|
||||
|
||||
a = pgids{4, 5, 6, 10, 11, 12, 13, 27, 35, 36}
|
||||
b = pgids{8, 9, 25, 30}
|
||||
c = a.merge(b)
|
||||
if !reflect.DeepEqual(c, pgids{4, 5, 6, 8, 9, 10, 11, 12, 13, 25, 27, 30, 35, 36}) {
|
||||
t.Errorf("mismatch: %v", c)
|
||||
}
|
||||
}
|
||||
|
||||
func TestPgids_merge_quick(t *testing.T) {
|
||||
if err := quick.Check(func(a, b pgids) bool {
|
||||
// Sort incoming lists.
|
||||
sort.Sort(a)
|
||||
sort.Sort(b)
|
||||
|
||||
// Merge the two lists together.
|
||||
got := a.merge(b)
|
||||
|
||||
// The expected value should be the two lists combined and sorted.
|
||||
exp := append(a, b...)
|
||||
sort.Sort(exp)
|
||||
|
||||
if !reflect.DeepEqual(exp, got) {
|
||||
t.Errorf("\nexp=%+v\ngot=%+v\n", exp, got)
|
||||
return false
|
||||
}
|
||||
|
||||
return true
|
||||
}, nil); err != nil {
|
||||
t.Fatal(err)
|
||||
}
|
||||
}
|
79
Godeps/_workspace/src/github.com/boltdb/bolt/quick_test.go
generated
vendored
Normal file
79
Godeps/_workspace/src/github.com/boltdb/bolt/quick_test.go
generated
vendored
Normal file
@ -0,0 +1,79 @@
|
||||
package bolt_test
|
||||
|
||||
import (
|
||||
"bytes"
|
||||
"flag"
|
||||
"fmt"
|
||||
"math/rand"
|
||||
"os"
|
||||
"reflect"
|
||||
"testing/quick"
|
||||
"time"
|
||||
)
|
||||
|
||||
// testing/quick defaults to 5 iterations and a random seed.
|
||||
// You can override these settings from the command line:
|
||||
//
|
||||
// -quick.count The number of iterations to perform.
|
||||
// -quick.seed The seed to use for randomizing.
|
||||
// -quick.maxitems The maximum number of items to insert into a DB.
|
||||
// -quick.maxksize The maximum size of a key.
|
||||
// -quick.maxvsize The maximum size of a value.
|
||||
//
|
||||
|
||||
var qcount, qseed, qmaxitems, qmaxksize, qmaxvsize int
|
||||
|
||||
func init() {
|
||||
flag.IntVar(&qcount, "quick.count", 5, "")
|
||||
flag.IntVar(&qseed, "quick.seed", int(time.Now().UnixNano())%100000, "")
|
||||
flag.IntVar(&qmaxitems, "quick.maxitems", 1000, "")
|
||||
flag.IntVar(&qmaxksize, "quick.maxksize", 1024, "")
|
||||
flag.IntVar(&qmaxvsize, "quick.maxvsize", 1024, "")
|
||||
flag.Parse()
|
||||
fmt.Fprintln(os.Stderr, "seed:", qseed)
|
||||
fmt.Fprintf(os.Stderr, "quick settings: count=%v, items=%v, ksize=%v, vsize=%v\n", qcount, qmaxitems, qmaxksize, qmaxvsize)
|
||||
}
|
||||
|
||||
func qconfig() *quick.Config {
|
||||
return &quick.Config{
|
||||
MaxCount: qcount,
|
||||
Rand: rand.New(rand.NewSource(int64(qseed))),
|
||||
}
|
||||
}
|
||||
|
||||
type testdata []testdataitem
|
||||
|
||||
func (t testdata) Len() int { return len(t) }
|
||||
func (t testdata) Swap(i, j int) { t[i], t[j] = t[j], t[i] }
|
||||
func (t testdata) Less(i, j int) bool { return bytes.Compare(t[i].Key, t[j].Key) == -1 }
|
||||
|
||||
func (t testdata) Generate(rand *rand.Rand, size int) reflect.Value {
|
||||
n := rand.Intn(qmaxitems-1) + 1
|
||||
items := make(testdata, n)
|
||||
for i := 0; i < n; i++ {
|
||||
item := &items[i]
|
||||
item.Key = randByteSlice(rand, 1, qmaxksize)
|
||||
item.Value = randByteSlice(rand, 0, qmaxvsize)
|
||||
}
|
||||
return reflect.ValueOf(items)
|
||||
}
|
||||
|
||||
type revtestdata []testdataitem
|
||||
|
||||
func (t revtestdata) Len() int { return len(t) }
|
||||
func (t revtestdata) Swap(i, j int) { t[i], t[j] = t[j], t[i] }
|
||||
func (t revtestdata) Less(i, j int) bool { return bytes.Compare(t[i].Key, t[j].Key) == 1 }
|
||||
|
||||
type testdataitem struct {
|
||||
Key []byte
|
||||
Value []byte
|
||||
}
|
||||
|
||||
func randByteSlice(rand *rand.Rand, minSize, maxSize int) []byte {
|
||||
n := rand.Intn(maxSize-minSize) + minSize
|
||||
b := make([]byte, n)
|
||||
for i := 0; i < n; i++ {
|
||||
b[i] = byte(rand.Intn(255))
|
||||
}
|
||||
return b
|
||||
}
|
327
Godeps/_workspace/src/github.com/boltdb/bolt/simulation_test.go
generated
vendored
Normal file
327
Godeps/_workspace/src/github.com/boltdb/bolt/simulation_test.go
generated
vendored
Normal file
@ -0,0 +1,327 @@
|
||||
package bolt_test
|
||||
|
||||
import (
|
||||
"bytes"
|
||||
"fmt"
|
||||
"math/rand"
|
||||
"sync"
|
||||
"testing"
|
||||
|
||||
"github.com/coreos/etcd/Godeps/_workspace/src/github.com/boltdb/bolt"
|
||||
)
|
||||
|
||||
func TestSimulate_1op_1p(t *testing.T) { testSimulate(t, 100, 1) }
|
||||
func TestSimulate_10op_1p(t *testing.T) { testSimulate(t, 10, 1) }
|
||||
func TestSimulate_100op_1p(t *testing.T) { testSimulate(t, 100, 1) }
|
||||
func TestSimulate_1000op_1p(t *testing.T) { testSimulate(t, 1000, 1) }
|
||||
func TestSimulate_10000op_1p(t *testing.T) { testSimulate(t, 10000, 1) }
|
||||
|
||||
func TestSimulate_10op_10p(t *testing.T) { testSimulate(t, 10, 10) }
|
||||
func TestSimulate_100op_10p(t *testing.T) { testSimulate(t, 100, 10) }
|
||||
func TestSimulate_1000op_10p(t *testing.T) { testSimulate(t, 1000, 10) }
|
||||
func TestSimulate_10000op_10p(t *testing.T) { testSimulate(t, 10000, 10) }
|
||||
|
||||
func TestSimulate_100op_100p(t *testing.T) { testSimulate(t, 100, 100) }
|
||||
func TestSimulate_1000op_100p(t *testing.T) { testSimulate(t, 1000, 100) }
|
||||
func TestSimulate_10000op_100p(t *testing.T) { testSimulate(t, 10000, 100) }
|
||||
|
||||
func TestSimulate_10000op_1000p(t *testing.T) { testSimulate(t, 10000, 1000) }
|
||||
|
||||
// Randomly generate operations on a given database with multiple clients to ensure consistency and thread safety.
|
||||
func testSimulate(t *testing.T, threadCount, parallelism int) {
|
||||
if testing.Short() {
|
||||
t.Skip("skipping test in short mode.")
|
||||
}
|
||||
|
||||
rand.Seed(int64(qseed))
|
||||
|
||||
// A list of operations that readers and writers can perform.
|
||||
var readerHandlers = []simulateHandler{simulateGetHandler}
|
||||
var writerHandlers = []simulateHandler{simulateGetHandler, simulatePutHandler}
|
||||
|
||||
var versions = make(map[int]*QuickDB)
|
||||
versions[1] = NewQuickDB()
|
||||
|
||||
db := NewTestDB()
|
||||
defer db.Close()
|
||||
|
||||
var mutex sync.Mutex
|
||||
|
||||
// Run n threads in parallel, each with their own operation.
|
||||
var wg sync.WaitGroup
|
||||
var threads = make(chan bool, parallelism)
|
||||
var i int
|
||||
for {
|
||||
threads <- true
|
||||
wg.Add(1)
|
||||
writable := ((rand.Int() % 100) < 20) // 20% writers
|
||||
|
||||
// Choose an operation to execute.
|
||||
var handler simulateHandler
|
||||
if writable {
|
||||
handler = writerHandlers[rand.Intn(len(writerHandlers))]
|
||||
} else {
|
||||
handler = readerHandlers[rand.Intn(len(readerHandlers))]
|
||||
}
|
||||
|
||||
// Execute a thread for the given operation.
|
||||
go func(writable bool, handler simulateHandler) {
|
||||
defer wg.Done()
|
||||
|
||||
// Start transaction.
|
||||
tx, err := db.Begin(writable)
|
||||
if err != nil {
|
||||
t.Fatal("tx begin: ", err)
|
||||
}
|
||||
|
||||
// Obtain current state of the dataset.
|
||||
mutex.Lock()
|
||||
var qdb = versions[tx.ID()]
|
||||
if writable {
|
||||
qdb = versions[tx.ID()-1].Copy()
|
||||
}
|
||||
mutex.Unlock()
|
||||
|
||||
// Make sure we commit/rollback the tx at the end and update the state.
|
||||
if writable {
|
||||
defer func() {
|
||||
mutex.Lock()
|
||||
versions[tx.ID()] = qdb
|
||||
mutex.Unlock()
|
||||
|
||||
ok(t, tx.Commit())
|
||||
}()
|
||||
} else {
|
||||
defer tx.Rollback()
|
||||
}
|
||||
|
||||
// Ignore operation if we don't have data yet.
|
||||
if qdb == nil {
|
||||
return
|
||||
}
|
||||
|
||||
// Execute handler.
|
||||
handler(tx, qdb)
|
||||
|
||||
// Release a thread back to the scheduling loop.
|
||||
<-threads
|
||||
}(writable, handler)
|
||||
|
||||
i++
|
||||
if i > threadCount {
|
||||
break
|
||||
}
|
||||
}
|
||||
|
||||
// Wait until all threads are done.
|
||||
wg.Wait()
|
||||
}
|
||||
|
||||
type simulateHandler func(tx *bolt.Tx, qdb *QuickDB)
|
||||
|
||||
// Retrieves a key from the database and verifies that it is what is expected.
|
||||
func simulateGetHandler(tx *bolt.Tx, qdb *QuickDB) {
|
||||
// Randomly retrieve an existing exist.
|
||||
keys := qdb.Rand()
|
||||
if len(keys) == 0 {
|
||||
return
|
||||
}
|
||||
|
||||
// Retrieve root bucket.
|
||||
b := tx.Bucket(keys[0])
|
||||
if b == nil {
|
||||
panic(fmt.Sprintf("bucket[0] expected: %08x\n", trunc(keys[0], 4)))
|
||||
}
|
||||
|
||||
// Drill into nested buckets.
|
||||
for _, key := range keys[1 : len(keys)-1] {
|
||||
b = b.Bucket(key)
|
||||
if b == nil {
|
||||
panic(fmt.Sprintf("bucket[n] expected: %v -> %v\n", keys, key))
|
||||
}
|
||||
}
|
||||
|
||||
// Verify key/value on the final bucket.
|
||||
expected := qdb.Get(keys)
|
||||
actual := b.Get(keys[len(keys)-1])
|
||||
if !bytes.Equal(actual, expected) {
|
||||
fmt.Println("=== EXPECTED ===")
|
||||
fmt.Println(expected)
|
||||
fmt.Println("=== ACTUAL ===")
|
||||
fmt.Println(actual)
|
||||
fmt.Println("=== END ===")
|
||||
panic("value mismatch")
|
||||
}
|
||||
}
|
||||
|
||||
// Inserts a key into the database.
|
||||
func simulatePutHandler(tx *bolt.Tx, qdb *QuickDB) {
|
||||
var err error
|
||||
keys, value := randKeys(), randValue()
|
||||
|
||||
// Retrieve root bucket.
|
||||
b := tx.Bucket(keys[0])
|
||||
if b == nil {
|
||||
b, err = tx.CreateBucket(keys[0])
|
||||
if err != nil {
|
||||
panic("create bucket: " + err.Error())
|
||||
}
|
||||
}
|
||||
|
||||
// Create nested buckets, if necessary.
|
||||
for _, key := range keys[1 : len(keys)-1] {
|
||||
child := b.Bucket(key)
|
||||
if child != nil {
|
||||
b = child
|
||||
} else {
|
||||
b, err = b.CreateBucket(key)
|
||||
if err != nil {
|
||||
panic("create bucket: " + err.Error())
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Insert into database.
|
||||
if err := b.Put(keys[len(keys)-1], value); err != nil {
|
||||
panic("put: " + err.Error())
|
||||
}
|
||||
|
||||
// Insert into in-memory database.
|
||||
qdb.Put(keys, value)
|
||||
}
|
||||
|
||||
// QuickDB is an in-memory database that replicates the functionality of the
|
||||
// Bolt DB type except that it is entirely in-memory. It is meant for testing
|
||||
// that the Bolt database is consistent.
|
||||
type QuickDB struct {
|
||||
sync.RWMutex
|
||||
m map[string]interface{}
|
||||
}
|
||||
|
||||
// NewQuickDB returns an instance of QuickDB.
|
||||
func NewQuickDB() *QuickDB {
|
||||
return &QuickDB{m: make(map[string]interface{})}
|
||||
}
|
||||
|
||||
// Get retrieves the value at a key path.
|
||||
func (db *QuickDB) Get(keys [][]byte) []byte {
|
||||
db.RLock()
|
||||
defer db.RUnlock()
|
||||
|
||||
m := db.m
|
||||
for _, key := range keys[:len(keys)-1] {
|
||||
value := m[string(key)]
|
||||
if value == nil {
|
||||
return nil
|
||||
}
|
||||
switch value := value.(type) {
|
||||
case map[string]interface{}:
|
||||
m = value
|
||||
case []byte:
|
||||
return nil
|
||||
}
|
||||
}
|
||||
|
||||
// Only return if it's a simple value.
|
||||
if value, ok := m[string(keys[len(keys)-1])].([]byte); ok {
|
||||
return value
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
// Put inserts a value into a key path.
|
||||
func (db *QuickDB) Put(keys [][]byte, value []byte) {
|
||||
db.Lock()
|
||||
defer db.Unlock()
|
||||
|
||||
// Build buckets all the way down the key path.
|
||||
m := db.m
|
||||
for _, key := range keys[:len(keys)-1] {
|
||||
if _, ok := m[string(key)].([]byte); ok {
|
||||
return // Keypath intersects with a simple value. Do nothing.
|
||||
}
|
||||
|
||||
if m[string(key)] == nil {
|
||||
m[string(key)] = make(map[string]interface{})
|
||||
}
|
||||
m = m[string(key)].(map[string]interface{})
|
||||
}
|
||||
|
||||
// Insert value into the last key.
|
||||
m[string(keys[len(keys)-1])] = value
|
||||
}
|
||||
|
||||
// Rand returns a random key path that points to a simple value.
|
||||
func (db *QuickDB) Rand() [][]byte {
|
||||
db.RLock()
|
||||
defer db.RUnlock()
|
||||
if len(db.m) == 0 {
|
||||
return nil
|
||||
}
|
||||
var keys [][]byte
|
||||
db.rand(db.m, &keys)
|
||||
return keys
|
||||
}
|
||||
|
||||
func (db *QuickDB) rand(m map[string]interface{}, keys *[][]byte) {
|
||||
i, index := 0, rand.Intn(len(m))
|
||||
for k, v := range m {
|
||||
if i == index {
|
||||
*keys = append(*keys, []byte(k))
|
||||
if v, ok := v.(map[string]interface{}); ok {
|
||||
db.rand(v, keys)
|
||||
}
|
||||
return
|
||||
}
|
||||
i++
|
||||
}
|
||||
panic("quickdb rand: out-of-range")
|
||||
}
|
||||
|
||||
// Copy copies the entire database.
|
||||
func (db *QuickDB) Copy() *QuickDB {
|
||||
db.RLock()
|
||||
defer db.RUnlock()
|
||||
return &QuickDB{m: db.copy(db.m)}
|
||||
}
|
||||
|
||||
func (db *QuickDB) copy(m map[string]interface{}) map[string]interface{} {
|
||||
clone := make(map[string]interface{}, len(m))
|
||||
for k, v := range m {
|
||||
switch v := v.(type) {
|
||||
case map[string]interface{}:
|
||||
clone[k] = db.copy(v)
|
||||
default:
|
||||
clone[k] = v
|
||||
}
|
||||
}
|
||||
return clone
|
||||
}
|
||||
|
||||
func randKey() []byte {
|
||||
var min, max = 1, 1024
|
||||
n := rand.Intn(max-min) + min
|
||||
b := make([]byte, n)
|
||||
for i := 0; i < n; i++ {
|
||||
b[i] = byte(rand.Intn(255))
|
||||
}
|
||||
return b
|
||||
}
|
||||
|
||||
func randKeys() [][]byte {
|
||||
var keys [][]byte
|
||||
var count = rand.Intn(2) + 2
|
||||
for i := 0; i < count; i++ {
|
||||
keys = append(keys, randKey())
|
||||
}
|
||||
return keys
|
||||
}
|
||||
|
||||
func randValue() []byte {
|
||||
n := rand.Intn(8192)
|
||||
b := make([]byte, n)
|
||||
for i := 0; i < n; i++ {
|
||||
b[i] = byte(rand.Intn(255))
|
||||
}
|
||||
return b
|
||||
}
|
79
Godeps/_workspace/src/github.com/boltdb/bolt/tx.go
generated
vendored
79
Godeps/_workspace/src/github.com/boltdb/bolt/tx.go
generated
vendored
@ -29,14 +29,6 @@ type Tx struct {
|
||||
pages map[pgid]*page
|
||||
stats TxStats
|
||||
commitHandlers []func()
|
||||
|
||||
// WriteFlag specifies the flag for write-related methods like WriteTo().
|
||||
// Tx opens the database file with the specified flag to copy the data.
|
||||
//
|
||||
// By default, the flag is unset, which works well for mostly in-memory
|
||||
// workloads. For databases that are much larger than available RAM,
|
||||
// set the flag to syscall.O_DIRECT to avoid trashing the page cache.
|
||||
WriteFlag int
|
||||
}
|
||||
|
||||
// init initializes the transaction.
|
||||
@ -95,21 +87,18 @@ func (tx *Tx) Stats() TxStats {
|
||||
|
||||
// Bucket retrieves a bucket by name.
|
||||
// Returns nil if the bucket does not exist.
|
||||
// The bucket instance is only valid for the lifetime of the transaction.
|
||||
func (tx *Tx) Bucket(name []byte) *Bucket {
|
||||
return tx.root.Bucket(name)
|
||||
}
|
||||
|
||||
// CreateBucket creates a new bucket.
|
||||
// Returns an error if the bucket already exists, if the bucket name is blank, or if the bucket name is too long.
|
||||
// The bucket instance is only valid for the lifetime of the transaction.
|
||||
func (tx *Tx) CreateBucket(name []byte) (*Bucket, error) {
|
||||
return tx.root.CreateBucket(name)
|
||||
}
|
||||
|
||||
// CreateBucketIfNotExists creates a new bucket if it doesn't already exist.
|
||||
// Returns an error if the bucket name is blank, or if the bucket name is too long.
|
||||
// The bucket instance is only valid for the lifetime of the transaction.
|
||||
func (tx *Tx) CreateBucketIfNotExists(name []byte) (*Bucket, error) {
|
||||
return tx.root.CreateBucketIfNotExists(name)
|
||||
}
|
||||
@ -168,8 +157,6 @@ func (tx *Tx) Commit() error {
|
||||
// Free the old root bucket.
|
||||
tx.meta.root.root = tx.root.root
|
||||
|
||||
opgid := tx.meta.pgid
|
||||
|
||||
// Free the freelist and allocate new pages for it. This will overestimate
|
||||
// the size of the freelist but not underestimate the size (which would be bad).
|
||||
tx.db.freelist.free(tx.meta.txid, tx.db.page(tx.meta.freelist))
|
||||
@ -184,14 +171,6 @@ func (tx *Tx) Commit() error {
|
||||
}
|
||||
tx.meta.freelist = p.id
|
||||
|
||||
// If the high water mark has moved up then attempt to grow the database.
|
||||
if tx.meta.pgid > opgid {
|
||||
if err := tx.db.grow(int(tx.meta.pgid+1) * tx.db.pageSize); err != nil {
|
||||
tx.rollback()
|
||||
return err
|
||||
}
|
||||
}
|
||||
|
||||
// Write dirty pages to disk.
|
||||
startTime = time.Now()
|
||||
if err := tx.write(); err != nil {
|
||||
@ -257,8 +236,7 @@ func (tx *Tx) close() {
|
||||
var freelistPendingN = tx.db.freelist.pending_count()
|
||||
var freelistAlloc = tx.db.freelist.size()
|
||||
|
||||
// Remove transaction ref & writer lock.
|
||||
tx.db.rwtx = nil
|
||||
// Remove writer lock.
|
||||
tx.db.rwlock.Unlock()
|
||||
|
||||
// Merge statistics.
|
||||
@ -272,16 +250,11 @@ func (tx *Tx) close() {
|
||||
} else {
|
||||
tx.db.removeTx(tx)
|
||||
}
|
||||
|
||||
// Clear all references.
|
||||
tx.db = nil
|
||||
tx.meta = nil
|
||||
tx.root = Bucket{tx: tx}
|
||||
tx.pages = nil
|
||||
}
|
||||
|
||||
// Copy writes the entire database to a writer.
|
||||
// This function exists for backwards compatibility. Use WriteTo() instead.
|
||||
// This function exists for backwards compatibility. Use WriteTo() in
|
||||
func (tx *Tx) Copy(w io.Writer) error {
|
||||
_, err := tx.WriteTo(w)
|
||||
return err
|
||||
@ -290,47 +263,29 @@ func (tx *Tx) Copy(w io.Writer) error {
|
||||
// WriteTo writes the entire database to a writer.
|
||||
// If err == nil then exactly tx.Size() bytes will be written into the writer.
|
||||
func (tx *Tx) WriteTo(w io.Writer) (n int64, err error) {
|
||||
// Attempt to open reader with WriteFlag
|
||||
f, err := os.OpenFile(tx.db.path, os.O_RDONLY|tx.WriteFlag, 0)
|
||||
if err != nil {
|
||||
return 0, err
|
||||
}
|
||||
defer func() { _ = f.Close() }()
|
||||
|
||||
// Generate a meta page. We use the same page data for both meta pages.
|
||||
buf := make([]byte, tx.db.pageSize)
|
||||
page := (*page)(unsafe.Pointer(&buf[0]))
|
||||
page.flags = metaPageFlag
|
||||
*page.meta() = *tx.meta
|
||||
|
||||
// Write meta 0.
|
||||
page.id = 0
|
||||
page.meta().checksum = page.meta().sum64()
|
||||
nn, err := w.Write(buf)
|
||||
n += int64(nn)
|
||||
if err != nil {
|
||||
return n, fmt.Errorf("meta 0 copy: %s", err)
|
||||
// Attempt to open reader directly.
|
||||
var f *os.File
|
||||
if f, err = os.OpenFile(tx.db.path, os.O_RDONLY|odirect, 0); err != nil {
|
||||
// Fallback to a regular open if that doesn't work.
|
||||
if f, err = os.OpenFile(tx.db.path, os.O_RDONLY, 0); err != nil {
|
||||
return 0, err
|
||||
}
|
||||
}
|
||||
|
||||
// Write meta 1 with a lower transaction id.
|
||||
page.id = 1
|
||||
page.meta().txid -= 1
|
||||
page.meta().checksum = page.meta().sum64()
|
||||
nn, err = w.Write(buf)
|
||||
n += int64(nn)
|
||||
// Copy the meta pages.
|
||||
tx.db.metalock.Lock()
|
||||
n, err = io.CopyN(w, f, int64(tx.db.pageSize*2))
|
||||
tx.db.metalock.Unlock()
|
||||
if err != nil {
|
||||
return n, fmt.Errorf("meta 1 copy: %s", err)
|
||||
}
|
||||
|
||||
// Move past the meta pages in the file.
|
||||
if _, err := f.Seek(int64(tx.db.pageSize*2), os.SEEK_SET); err != nil {
|
||||
return n, fmt.Errorf("seek: %s", err)
|
||||
_ = f.Close()
|
||||
return n, fmt.Errorf("meta copy: %s", err)
|
||||
}
|
||||
|
||||
// Copy data pages.
|
||||
wn, err := io.CopyN(w, f, tx.Size()-int64(tx.db.pageSize*2))
|
||||
n += wn
|
||||
if err != nil {
|
||||
_ = f.Close()
|
||||
return n, err
|
||||
}
|
||||
|
||||
@ -537,7 +492,7 @@ func (tx *Tx) writeMeta() error {
|
||||
}
|
||||
|
||||
// page returns a reference to the page with a given id.
|
||||
// If page has been written to then a temporary buffered page is returned.
|
||||
// If page has been written to then a temporary bufferred page is returned.
|
||||
func (tx *Tx) page(id pgid) *page {
|
||||
// Check the dirty pages first.
|
||||
if tx.pages != nil {
|
||||
|
456
Godeps/_workspace/src/github.com/boltdb/bolt/tx_test.go
generated
vendored
Normal file
456
Godeps/_workspace/src/github.com/boltdb/bolt/tx_test.go
generated
vendored
Normal file
@ -0,0 +1,456 @@
|
||||
package bolt_test
|
||||
|
||||
import (
|
||||
"errors"
|
||||
"fmt"
|
||||
"os"
|
||||
"testing"
|
||||
|
||||
"github.com/coreos/etcd/Godeps/_workspace/src/github.com/boltdb/bolt"
|
||||
)
|
||||
|
||||
// Ensure that committing a closed transaction returns an error.
|
||||
func TestTx_Commit_Closed(t *testing.T) {
|
||||
db := NewTestDB()
|
||||
defer db.Close()
|
||||
tx, _ := db.Begin(true)
|
||||
tx.CreateBucket([]byte("foo"))
|
||||
ok(t, tx.Commit())
|
||||
equals(t, tx.Commit(), bolt.ErrTxClosed)
|
||||
}
|
||||
|
||||
// Ensure that rolling back a closed transaction returns an error.
|
||||
func TestTx_Rollback_Closed(t *testing.T) {
|
||||
db := NewTestDB()
|
||||
defer db.Close()
|
||||
tx, _ := db.Begin(true)
|
||||
ok(t, tx.Rollback())
|
||||
equals(t, tx.Rollback(), bolt.ErrTxClosed)
|
||||
}
|
||||
|
||||
// Ensure that committing a read-only transaction returns an error.
|
||||
func TestTx_Commit_ReadOnly(t *testing.T) {
|
||||
db := NewTestDB()
|
||||
defer db.Close()
|
||||
tx, _ := db.Begin(false)
|
||||
equals(t, tx.Commit(), bolt.ErrTxNotWritable)
|
||||
}
|
||||
|
||||
// Ensure that a transaction can retrieve a cursor on the root bucket.
|
||||
func TestTx_Cursor(t *testing.T) {
|
||||
db := NewTestDB()
|
||||
defer db.Close()
|
||||
db.Update(func(tx *bolt.Tx) error {
|
||||
tx.CreateBucket([]byte("widgets"))
|
||||
tx.CreateBucket([]byte("woojits"))
|
||||
c := tx.Cursor()
|
||||
|
||||
k, v := c.First()
|
||||
equals(t, "widgets", string(k))
|
||||
assert(t, v == nil, "")
|
||||
|
||||
k, v = c.Next()
|
||||
equals(t, "woojits", string(k))
|
||||
assert(t, v == nil, "")
|
||||
|
||||
k, v = c.Next()
|
||||
assert(t, k == nil, "")
|
||||
assert(t, v == nil, "")
|
||||
|
||||
return nil
|
||||
})
|
||||
}
|
||||
|
||||
// Ensure that creating a bucket with a read-only transaction returns an error.
|
||||
func TestTx_CreateBucket_ReadOnly(t *testing.T) {
|
||||
db := NewTestDB()
|
||||
defer db.Close()
|
||||
db.View(func(tx *bolt.Tx) error {
|
||||
b, err := tx.CreateBucket([]byte("foo"))
|
||||
assert(t, b == nil, "")
|
||||
equals(t, bolt.ErrTxNotWritable, err)
|
||||
return nil
|
||||
})
|
||||
}
|
||||
|
||||
// Ensure that creating a bucket on a closed transaction returns an error.
|
||||
func TestTx_CreateBucket_Closed(t *testing.T) {
|
||||
db := NewTestDB()
|
||||
defer db.Close()
|
||||
tx, _ := db.Begin(true)
|
||||
tx.Commit()
|
||||
b, err := tx.CreateBucket([]byte("foo"))
|
||||
assert(t, b == nil, "")
|
||||
equals(t, bolt.ErrTxClosed, err)
|
||||
}
|
||||
|
||||
// Ensure that a Tx can retrieve a bucket.
|
||||
func TestTx_Bucket(t *testing.T) {
|
||||
db := NewTestDB()
|
||||
defer db.Close()
|
||||
db.Update(func(tx *bolt.Tx) error {
|
||||
tx.CreateBucket([]byte("widgets"))
|
||||
b := tx.Bucket([]byte("widgets"))
|
||||
assert(t, b != nil, "")
|
||||
return nil
|
||||
})
|
||||
}
|
||||
|
||||
// Ensure that a Tx retrieving a non-existent key returns nil.
|
||||
func TestTx_Get_Missing(t *testing.T) {
|
||||
db := NewTestDB()
|
||||
defer db.Close()
|
||||
db.Update(func(tx *bolt.Tx) error {
|
||||
tx.CreateBucket([]byte("widgets"))
|
||||
tx.Bucket([]byte("widgets")).Put([]byte("foo"), []byte("bar"))
|
||||
value := tx.Bucket([]byte("widgets")).Get([]byte("no_such_key"))
|
||||
assert(t, value == nil, "")
|
||||
return nil
|
||||
})
|
||||
}
|
||||
|
||||
// Ensure that a bucket can be created and retrieved.
|
||||
func TestTx_CreateBucket(t *testing.T) {
|
||||
db := NewTestDB()
|
||||
defer db.Close()
|
||||
|
||||
// Create a bucket.
|
||||
db.Update(func(tx *bolt.Tx) error {
|
||||
b, err := tx.CreateBucket([]byte("widgets"))
|
||||
assert(t, b != nil, "")
|
||||
ok(t, err)
|
||||
return nil
|
||||
})
|
||||
|
||||
// Read the bucket through a separate transaction.
|
||||
db.View(func(tx *bolt.Tx) error {
|
||||
b := tx.Bucket([]byte("widgets"))
|
||||
assert(t, b != nil, "")
|
||||
return nil
|
||||
})
|
||||
}
|
||||
|
||||
// Ensure that a bucket can be created if it doesn't already exist.
|
||||
func TestTx_CreateBucketIfNotExists(t *testing.T) {
|
||||
db := NewTestDB()
|
||||
defer db.Close()
|
||||
db.Update(func(tx *bolt.Tx) error {
|
||||
b, err := tx.CreateBucketIfNotExists([]byte("widgets"))
|
||||
assert(t, b != nil, "")
|
||||
ok(t, err)
|
||||
|
||||
b, err = tx.CreateBucketIfNotExists([]byte("widgets"))
|
||||
assert(t, b != nil, "")
|
||||
ok(t, err)
|
||||
|
||||
b, err = tx.CreateBucketIfNotExists([]byte{})
|
||||
assert(t, b == nil, "")
|
||||
equals(t, bolt.ErrBucketNameRequired, err)
|
||||
|
||||
b, err = tx.CreateBucketIfNotExists(nil)
|
||||
assert(t, b == nil, "")
|
||||
equals(t, bolt.ErrBucketNameRequired, err)
|
||||
return nil
|
||||
})
|
||||
|
||||
// Read the bucket through a separate transaction.
|
||||
db.View(func(tx *bolt.Tx) error {
|
||||
b := tx.Bucket([]byte("widgets"))
|
||||
assert(t, b != nil, "")
|
||||
return nil
|
||||
})
|
||||
}
|
||||
|
||||
// Ensure that a bucket cannot be created twice.
|
||||
func TestTx_CreateBucket_Exists(t *testing.T) {
|
||||
db := NewTestDB()
|
||||
defer db.Close()
|
||||
// Create a bucket.
|
||||
db.Update(func(tx *bolt.Tx) error {
|
||||
b, err := tx.CreateBucket([]byte("widgets"))
|
||||
assert(t, b != nil, "")
|
||||
ok(t, err)
|
||||
return nil
|
||||
})
|
||||
|
||||
// Create the same bucket again.
|
||||
db.Update(func(tx *bolt.Tx) error {
|
||||
b, err := tx.CreateBucket([]byte("widgets"))
|
||||
assert(t, b == nil, "")
|
||||
equals(t, bolt.ErrBucketExists, err)
|
||||
return nil
|
||||
})
|
||||
}
|
||||
|
||||
// Ensure that a bucket is created with a non-blank name.
|
||||
func TestTx_CreateBucket_NameRequired(t *testing.T) {
|
||||
db := NewTestDB()
|
||||
defer db.Close()
|
||||
db.Update(func(tx *bolt.Tx) error {
|
||||
b, err := tx.CreateBucket(nil)
|
||||
assert(t, b == nil, "")
|
||||
equals(t, bolt.ErrBucketNameRequired, err)
|
||||
return nil
|
||||
})
|
||||
}
|
||||
|
||||
// Ensure that a bucket can be deleted.
|
||||
func TestTx_DeleteBucket(t *testing.T) {
|
||||
db := NewTestDB()
|
||||
defer db.Close()
|
||||
|
||||
// Create a bucket and add a value.
|
||||
db.Update(func(tx *bolt.Tx) error {
|
||||
tx.CreateBucket([]byte("widgets"))
|
||||
tx.Bucket([]byte("widgets")).Put([]byte("foo"), []byte("bar"))
|
||||
return nil
|
||||
})
|
||||
|
||||
// Delete the bucket and make sure we can't get the value.
|
||||
db.Update(func(tx *bolt.Tx) error {
|
||||
ok(t, tx.DeleteBucket([]byte("widgets")))
|
||||
assert(t, tx.Bucket([]byte("widgets")) == nil, "")
|
||||
return nil
|
||||
})
|
||||
|
||||
db.Update(func(tx *bolt.Tx) error {
|
||||
// Create the bucket again and make sure there's not a phantom value.
|
||||
b, err := tx.CreateBucket([]byte("widgets"))
|
||||
assert(t, b != nil, "")
|
||||
ok(t, err)
|
||||
assert(t, tx.Bucket([]byte("widgets")).Get([]byte("foo")) == nil, "")
|
||||
return nil
|
||||
})
|
||||
}
|
||||
|
||||
// Ensure that deleting a bucket on a closed transaction returns an error.
|
||||
func TestTx_DeleteBucket_Closed(t *testing.T) {
|
||||
db := NewTestDB()
|
||||
defer db.Close()
|
||||
tx, _ := db.Begin(true)
|
||||
tx.Commit()
|
||||
equals(t, tx.DeleteBucket([]byte("foo")), bolt.ErrTxClosed)
|
||||
}
|
||||
|
||||
// Ensure that deleting a bucket with a read-only transaction returns an error.
|
||||
func TestTx_DeleteBucket_ReadOnly(t *testing.T) {
|
||||
db := NewTestDB()
|
||||
defer db.Close()
|
||||
db.View(func(tx *bolt.Tx) error {
|
||||
equals(t, tx.DeleteBucket([]byte("foo")), bolt.ErrTxNotWritable)
|
||||
return nil
|
||||
})
|
||||
}
|
||||
|
||||
// Ensure that nothing happens when deleting a bucket that doesn't exist.
|
||||
func TestTx_DeleteBucket_NotFound(t *testing.T) {
|
||||
db := NewTestDB()
|
||||
defer db.Close()
|
||||
db.Update(func(tx *bolt.Tx) error {
|
||||
equals(t, bolt.ErrBucketNotFound, tx.DeleteBucket([]byte("widgets")))
|
||||
return nil
|
||||
})
|
||||
}
|
||||
|
||||
// Ensure that no error is returned when a tx.ForEach function does not return
|
||||
// an error.
|
||||
func TestTx_ForEach_NoError(t *testing.T) {
|
||||
db := NewTestDB()
|
||||
defer db.Close()
|
||||
db.Update(func(tx *bolt.Tx) error {
|
||||
tx.CreateBucket([]byte("widgets"))
|
||||
tx.Bucket([]byte("widgets")).Put([]byte("foo"), []byte("bar"))
|
||||
|
||||
equals(t, nil, tx.ForEach(func(name []byte, b *bolt.Bucket) error {
|
||||
return nil
|
||||
}))
|
||||
return nil
|
||||
})
|
||||
}
|
||||
|
||||
// Ensure that an error is returned when a tx.ForEach function returns an error.
|
||||
func TestTx_ForEach_WithError(t *testing.T) {
|
||||
db := NewTestDB()
|
||||
defer db.Close()
|
||||
db.Update(func(tx *bolt.Tx) error {
|
||||
tx.CreateBucket([]byte("widgets"))
|
||||
tx.Bucket([]byte("widgets")).Put([]byte("foo"), []byte("bar"))
|
||||
|
||||
err := errors.New("foo")
|
||||
equals(t, err, tx.ForEach(func(name []byte, b *bolt.Bucket) error {
|
||||
return err
|
||||
}))
|
||||
return nil
|
||||
})
|
||||
}
|
||||
|
||||
// Ensure that Tx commit handlers are called after a transaction successfully commits.
|
||||
func TestTx_OnCommit(t *testing.T) {
|
||||
var x int
|
||||
db := NewTestDB()
|
||||
defer db.Close()
|
||||
db.Update(func(tx *bolt.Tx) error {
|
||||
tx.OnCommit(func() { x += 1 })
|
||||
tx.OnCommit(func() { x += 2 })
|
||||
_, err := tx.CreateBucket([]byte("widgets"))
|
||||
return err
|
||||
})
|
||||
equals(t, 3, x)
|
||||
}
|
||||
|
||||
// Ensure that Tx commit handlers are NOT called after a transaction rolls back.
|
||||
func TestTx_OnCommit_Rollback(t *testing.T) {
|
||||
var x int
|
||||
db := NewTestDB()
|
||||
defer db.Close()
|
||||
db.Update(func(tx *bolt.Tx) error {
|
||||
tx.OnCommit(func() { x += 1 })
|
||||
tx.OnCommit(func() { x += 2 })
|
||||
tx.CreateBucket([]byte("widgets"))
|
||||
return errors.New("rollback this commit")
|
||||
})
|
||||
equals(t, 0, x)
|
||||
}
|
||||
|
||||
// Ensure that the database can be copied to a file path.
|
||||
func TestTx_CopyFile(t *testing.T) {
|
||||
db := NewTestDB()
|
||||
defer db.Close()
|
||||
var dest = tempfile()
|
||||
db.Update(func(tx *bolt.Tx) error {
|
||||
tx.CreateBucket([]byte("widgets"))
|
||||
tx.Bucket([]byte("widgets")).Put([]byte("foo"), []byte("bar"))
|
||||
tx.Bucket([]byte("widgets")).Put([]byte("baz"), []byte("bat"))
|
||||
return nil
|
||||
})
|
||||
|
||||
ok(t, db.View(func(tx *bolt.Tx) error { return tx.CopyFile(dest, 0600) }))
|
||||
|
||||
db2, err := bolt.Open(dest, 0600, nil)
|
||||
ok(t, err)
|
||||
defer db2.Close()
|
||||
|
||||
db2.View(func(tx *bolt.Tx) error {
|
||||
equals(t, []byte("bar"), tx.Bucket([]byte("widgets")).Get([]byte("foo")))
|
||||
equals(t, []byte("bat"), tx.Bucket([]byte("widgets")).Get([]byte("baz")))
|
||||
return nil
|
||||
})
|
||||
}
|
||||
|
||||
type failWriterError struct{}
|
||||
|
||||
func (failWriterError) Error() string {
|
||||
return "error injected for tests"
|
||||
}
|
||||
|
||||
type failWriter struct {
|
||||
// fail after this many bytes
|
||||
After int
|
||||
}
|
||||
|
||||
func (f *failWriter) Write(p []byte) (n int, err error) {
|
||||
n = len(p)
|
||||
if n > f.After {
|
||||
n = f.After
|
||||
err = failWriterError{}
|
||||
}
|
||||
f.After -= n
|
||||
return n, err
|
||||
}
|
||||
|
||||
// Ensure that Copy handles write errors right.
|
||||
func TestTx_CopyFile_Error_Meta(t *testing.T) {
|
||||
db := NewTestDB()
|
||||
defer db.Close()
|
||||
db.Update(func(tx *bolt.Tx) error {
|
||||
tx.CreateBucket([]byte("widgets"))
|
||||
tx.Bucket([]byte("widgets")).Put([]byte("foo"), []byte("bar"))
|
||||
tx.Bucket([]byte("widgets")).Put([]byte("baz"), []byte("bat"))
|
||||
return nil
|
||||
})
|
||||
|
||||
err := db.View(func(tx *bolt.Tx) error { return tx.Copy(&failWriter{}) })
|
||||
equals(t, err.Error(), "meta copy: error injected for tests")
|
||||
}
|
||||
|
||||
// Ensure that Copy handles write errors right.
|
||||
func TestTx_CopyFile_Error_Normal(t *testing.T) {
|
||||
db := NewTestDB()
|
||||
defer db.Close()
|
||||
db.Update(func(tx *bolt.Tx) error {
|
||||
tx.CreateBucket([]byte("widgets"))
|
||||
tx.Bucket([]byte("widgets")).Put([]byte("foo"), []byte("bar"))
|
||||
tx.Bucket([]byte("widgets")).Put([]byte("baz"), []byte("bat"))
|
||||
return nil
|
||||
})
|
||||
|
||||
err := db.View(func(tx *bolt.Tx) error { return tx.Copy(&failWriter{3 * db.Info().PageSize}) })
|
||||
equals(t, err.Error(), "error injected for tests")
|
||||
}
|
||||
|
||||
func ExampleTx_Rollback() {
|
||||
// Open the database.
|
||||
db, _ := bolt.Open(tempfile(), 0666, nil)
|
||||
defer os.Remove(db.Path())
|
||||
defer db.Close()
|
||||
|
||||
// Create a bucket.
|
||||
db.Update(func(tx *bolt.Tx) error {
|
||||
_, err := tx.CreateBucket([]byte("widgets"))
|
||||
return err
|
||||
})
|
||||
|
||||
// Set a value for a key.
|
||||
db.Update(func(tx *bolt.Tx) error {
|
||||
return tx.Bucket([]byte("widgets")).Put([]byte("foo"), []byte("bar"))
|
||||
})
|
||||
|
||||
// Update the key but rollback the transaction so it never saves.
|
||||
tx, _ := db.Begin(true)
|
||||
b := tx.Bucket([]byte("widgets"))
|
||||
b.Put([]byte("foo"), []byte("baz"))
|
||||
tx.Rollback()
|
||||
|
||||
// Ensure that our original value is still set.
|
||||
db.View(func(tx *bolt.Tx) error {
|
||||
value := tx.Bucket([]byte("widgets")).Get([]byte("foo"))
|
||||
fmt.Printf("The value for 'foo' is still: %s\n", value)
|
||||
return nil
|
||||
})
|
||||
|
||||
// Output:
|
||||
// The value for 'foo' is still: bar
|
||||
}
|
||||
|
||||
func ExampleTx_CopyFile() {
|
||||
// Open the database.
|
||||
db, _ := bolt.Open(tempfile(), 0666, nil)
|
||||
defer os.Remove(db.Path())
|
||||
defer db.Close()
|
||||
|
||||
// Create a bucket and a key.
|
||||
db.Update(func(tx *bolt.Tx) error {
|
||||
tx.CreateBucket([]byte("widgets"))
|
||||
tx.Bucket([]byte("widgets")).Put([]byte("foo"), []byte("bar"))
|
||||
return nil
|
||||
})
|
||||
|
||||
// Copy the database to another file.
|
||||
toFile := tempfile()
|
||||
db.View(func(tx *bolt.Tx) error { return tx.CopyFile(toFile, 0666) })
|
||||
defer os.Remove(toFile)
|
||||
|
||||
// Open the cloned database.
|
||||
db2, _ := bolt.Open(toFile, 0666, nil)
|
||||
defer db2.Close()
|
||||
|
||||
// Ensure that the key exists in the copy.
|
||||
db2.View(func(tx *bolt.Tx) error {
|
||||
value := tx.Bucket([]byte("widgets")).Get([]byte("foo"))
|
||||
fmt.Printf("The value for 'foo' in the clone is: %s\n", value)
|
||||
return nil
|
||||
})
|
||||
|
||||
// Output:
|
||||
// The value for 'foo' in the clone is: bar
|
||||
}
|
1
Godeps/_workspace/src/github.com/bradfitz/http2/.gitignore
generated
vendored
Normal file
1
Godeps/_workspace/src/github.com/bradfitz/http2/.gitignore
generated
vendored
Normal file
@ -0,0 +1 @@
|
||||
*~
|
19
Godeps/_workspace/src/github.com/bradfitz/http2/AUTHORS
generated
vendored
Normal file
19
Godeps/_workspace/src/github.com/bradfitz/http2/AUTHORS
generated
vendored
Normal file
@ -0,0 +1,19 @@
|
||||
# This file is like Go's AUTHORS file: it lists Copyright holders.
|
||||
# The list of humans who have contributd is in the CONTRIBUTORS file.
|
||||
#
|
||||
# To contribute to this project, because it will eventually be folded
|
||||
# back in to Go itself, you need to submit a CLA:
|
||||
#
|
||||
# http://golang.org/doc/contribute.html#copyright
|
||||
#
|
||||
# Then you get added to CONTRIBUTORS and you or your company get added
|
||||
# to the AUTHORS file.
|
||||
|
||||
Blake Mizerany <blake.mizerany@gmail.com> github=bmizerany
|
||||
Daniel Morsing <daniel.morsing@gmail.com> github=DanielMorsing
|
||||
Gabriel Aszalos <gabriel.aszalos@gmail.com> github=gbbr
|
||||
Google, Inc.
|
||||
Keith Rarick <kr@xph.us> github=kr
|
||||
Matthew Keenan <tank.en.mate@gmail.com> <github@mattkeenan.net> github=mattkeenan
|
||||
Matt Layher <mdlayher@gmail.com> github=mdlayher
|
||||
Tatsuhiro Tsujikawa <tatsuhiro.t@gmail.com> github=tatsuhiro-t
|
19
Godeps/_workspace/src/github.com/bradfitz/http2/CONTRIBUTORS
generated
vendored
Normal file
19
Godeps/_workspace/src/github.com/bradfitz/http2/CONTRIBUTORS
generated
vendored
Normal file
@ -0,0 +1,19 @@
|
||||
# This file is like Go's CONTRIBUTORS file: it lists humans.
|
||||
# The list of copyright holders (which may be companies) are in the AUTHORS file.
|
||||
#
|
||||
# To contribute to this project, because it will eventually be folded
|
||||
# back in to Go itself, you need to submit a CLA:
|
||||
#
|
||||
# http://golang.org/doc/contribute.html#copyright
|
||||
#
|
||||
# Then you get added to CONTRIBUTORS and you or your company get added
|
||||
# to the AUTHORS file.
|
||||
|
||||
Blake Mizerany <blake.mizerany@gmail.com> github=bmizerany
|
||||
Brad Fitzpatrick <bradfitz@golang.org> github=bradfitz
|
||||
Daniel Morsing <daniel.morsing@gmail.com> github=DanielMorsing
|
||||
Gabriel Aszalos <gabriel.aszalos@gmail.com> github=gbbr
|
||||
Keith Rarick <kr@xph.us> github=kr
|
||||
Matthew Keenan <tank.en.mate@gmail.com> <github@mattkeenan.net> github=mattkeenan
|
||||
Matt Layher <mdlayher@gmail.com> github=mdlayher
|
||||
Tatsuhiro Tsujikawa <tatsuhiro.t@gmail.com> github=tatsuhiro-t
|
@ -17,15 +17,8 @@ RUN apt-get install -y --no-install-recommends \
|
||||
libcunit1-dev libssl-dev libxml2-dev libevent-dev \
|
||||
automake autoconf
|
||||
|
||||
# The list of packages nghttp2 recommends for h2load:
|
||||
RUN apt-get install -y --no-install-recommends make binutils \
|
||||
autoconf automake autotools-dev \
|
||||
libtool pkg-config zlib1g-dev libcunit1-dev libssl-dev libxml2-dev \
|
||||
libev-dev libevent-dev libjansson-dev libjemalloc-dev \
|
||||
cython python3.4-dev python-setuptools
|
||||
|
||||
# Note: setting NGHTTP2_VER before the git clone, so an old git clone isn't cached:
|
||||
ENV NGHTTP2_VER 895da9a
|
||||
ENV NGHTTP2_VER af24f8394e43f4
|
||||
RUN cd /root && git clone https://github.com/tatsuhiro-t/nghttp2.git
|
||||
|
||||
WORKDIR /root/nghttp2
|
||||
@ -38,9 +31,9 @@ RUN make
|
||||
RUN make install
|
||||
|
||||
WORKDIR /root
|
||||
RUN wget http://curl.haxx.se/download/curl-7.45.0.tar.gz
|
||||
RUN tar -zxvf curl-7.45.0.tar.gz
|
||||
WORKDIR /root/curl-7.45.0
|
||||
RUN wget http://curl.haxx.se/download/curl-7.40.0.tar.gz
|
||||
RUN tar -zxvf curl-7.40.0.tar.gz
|
||||
WORKDIR /root/curl-7.40.0
|
||||
RUN ./configure --with-ssl --with-nghttp2=/usr/local
|
||||
RUN make
|
||||
RUN make install
|
5
Godeps/_workspace/src/github.com/bradfitz/http2/HACKING
generated
vendored
Normal file
5
Godeps/_workspace/src/github.com/bradfitz/http2/HACKING
generated
vendored
Normal file
@ -0,0 +1,5 @@
|
||||
We only accept contributions from users who have gone through Go's
|
||||
contribution process (signed a CLA).
|
||||
|
||||
Please acknowledge whether you have (and use the same email) if
|
||||
sending a pull request.
|
7
Godeps/_workspace/src/github.com/bradfitz/http2/LICENSE
generated
vendored
Normal file
7
Godeps/_workspace/src/github.com/bradfitz/http2/LICENSE
generated
vendored
Normal file
@ -0,0 +1,7 @@
|
||||
Copyright 2014 Google & the Go AUTHORS
|
||||
|
||||
Go AUTHORS are:
|
||||
See https://code.google.com/p/go/source/browse/AUTHORS
|
||||
|
||||
Licensed under the terms of Go itself:
|
||||
https://code.google.com/p/go/source/browse/LICENSE
|
@ -10,11 +10,8 @@ Status:
|
||||
* The client work has just started but shares a lot of code
|
||||
is coming along much quicker.
|
||||
|
||||
Docs are at https://godoc.org/golang.org/x/net/http2
|
||||
Docs are at https://godoc.org/github.com/bradfitz/http2
|
||||
|
||||
Demo test server at https://http2.golang.org/
|
||||
|
||||
Help & bug reports welcome!
|
||||
|
||||
Contributing: https://golang.org/doc/contribute.html
|
||||
Bugs: https://golang.org/issue/new?title=x/net/http2:+
|
||||
Help & bug reports welcome.
|
75
Godeps/_workspace/src/github.com/bradfitz/http2/buffer.go
generated
vendored
Normal file
75
Godeps/_workspace/src/github.com/bradfitz/http2/buffer.go
generated
vendored
Normal file
@ -0,0 +1,75 @@
|
||||
// Copyright 2014 The Go Authors.
|
||||
// See https://code.google.com/p/go/source/browse/CONTRIBUTORS
|
||||
// Licensed under the same terms as Go itself:
|
||||
// https://code.google.com/p/go/source/browse/LICENSE
|
||||
|
||||
package http2
|
||||
|
||||
import (
|
||||
"errors"
|
||||
)
|
||||
|
||||
// buffer is an io.ReadWriteCloser backed by a fixed size buffer.
|
||||
// It never allocates, but moves old data as new data is written.
|
||||
type buffer struct {
|
||||
buf []byte
|
||||
r, w int
|
||||
closed bool
|
||||
err error // err to return to reader
|
||||
}
|
||||
|
||||
var (
|
||||
errReadEmpty = errors.New("read from empty buffer")
|
||||
errWriteFull = errors.New("write on full buffer")
|
||||
)
|
||||
|
||||
// Read copies bytes from the buffer into p.
|
||||
// It is an error to read when no data is available.
|
||||
func (b *buffer) Read(p []byte) (n int, err error) {
|
||||
n = copy(p, b.buf[b.r:b.w])
|
||||
b.r += n
|
||||
if b.closed && b.r == b.w {
|
||||
err = b.err
|
||||
} else if b.r == b.w && n == 0 {
|
||||
err = errReadEmpty
|
||||
}
|
||||
return n, err
|
||||
}
|
||||
|
||||
// Len returns the number of bytes of the unread portion of the buffer.
|
||||
func (b *buffer) Len() int {
|
||||
return b.w - b.r
|
||||
}
|
||||
|
||||
// Write copies bytes from p into the buffer.
|
||||
// It is an error to write more data than the buffer can hold.
|
||||
func (b *buffer) Write(p []byte) (n int, err error) {
|
||||
if b.closed {
|
||||
return 0, errors.New("closed")
|
||||
}
|
||||
|
||||
// Slide existing data to beginning.
|
||||
if b.r > 0 && len(p) > len(b.buf)-b.w {
|
||||
copy(b.buf, b.buf[b.r:b.w])
|
||||
b.w -= b.r
|
||||
b.r = 0
|
||||
}
|
||||
|
||||
// Write new data.
|
||||
n = copy(b.buf[b.w:], p)
|
||||
b.w += n
|
||||
if n < len(p) {
|
||||
err = errWriteFull
|
||||
}
|
||||
return n, err
|
||||
}
|
||||
|
||||
// Close marks the buffer as closed. Future calls to Write will
|
||||
// return an error. Future calls to Read, once the buffer is
|
||||
// empty, will return err.
|
||||
func (b *buffer) Close(err error) {
|
||||
if !b.closed {
|
||||
b.closed = true
|
||||
b.err = err
|
||||
}
|
||||
}
|
73
Godeps/_workspace/src/github.com/bradfitz/http2/buffer_test.go
generated
vendored
Normal file
73
Godeps/_workspace/src/github.com/bradfitz/http2/buffer_test.go
generated
vendored
Normal file
@ -0,0 +1,73 @@
|
||||
// Copyright 2014 The Go Authors.
|
||||
// See https://code.google.com/p/go/source/browse/CONTRIBUTORS
|
||||
// Licensed under the same terms as Go itself:
|
||||
// https://code.google.com/p/go/source/browse/LICENSE
|
||||
|
||||
package http2
|
||||
|
||||
import (
|
||||
"io"
|
||||
"reflect"
|
||||
"testing"
|
||||
)
|
||||
|
||||
var bufferReadTests = []struct {
|
||||
buf buffer
|
||||
read, wn int
|
||||
werr error
|
||||
wp []byte
|
||||
wbuf buffer
|
||||
}{
|
||||
{
|
||||
buffer{[]byte{'a', 0}, 0, 1, false, nil},
|
||||
5, 1, nil, []byte{'a'},
|
||||
buffer{[]byte{'a', 0}, 1, 1, false, nil},
|
||||
},
|
||||
{
|
||||
buffer{[]byte{'a', 0}, 0, 1, true, io.EOF},
|
||||
5, 1, io.EOF, []byte{'a'},
|
||||
buffer{[]byte{'a', 0}, 1, 1, true, io.EOF},
|
||||
},
|
||||
{
|
||||
buffer{[]byte{0, 'a'}, 1, 2, false, nil},
|
||||
5, 1, nil, []byte{'a'},
|
||||
buffer{[]byte{0, 'a'}, 2, 2, false, nil},
|
||||
},
|
||||
{
|
||||
buffer{[]byte{0, 'a'}, 1, 2, true, io.EOF},
|
||||
5, 1, io.EOF, []byte{'a'},
|
||||
buffer{[]byte{0, 'a'}, 2, 2, true, io.EOF},
|
||||
},
|
||||
{
|
||||
buffer{[]byte{}, 0, 0, false, nil},
|
||||
5, 0, errReadEmpty, []byte{},
|
||||
buffer{[]byte{}, 0, 0, false, nil},
|
||||
},
|
||||
{
|
||||
buffer{[]byte{}, 0, 0, true, io.EOF},
|
||||
5, 0, io.EOF, []byte{},
|
||||
buffer{[]byte{}, 0, 0, true, io.EOF},
|
||||
},
|
||||
}
|
||||
|
||||
func TestBufferRead(t *testing.T) {
|
||||
for i, tt := range bufferReadTests {
|
||||
read := make([]byte, tt.read)
|
||||
n, err := tt.buf.Read(read)
|
||||
if n != tt.wn {
|
||||
t.Errorf("#%d: wn = %d want %d", i, n, tt.wn)
|
||||
continue
|
||||
}
|
||||
if err != tt.werr {
|
||||
t.Errorf("#%d: werr = %v want %v", i, err, tt.werr)
|
||||
continue
|
||||
}
|
||||
read = read[:n]
|
||||
if !reflect.DeepEqual(read, tt.wp) {
|
||||
t.Errorf("#%d: read = %+v want %+v", i, read, tt.wp)
|
||||
}
|
||||
if !reflect.DeepEqual(tt.buf, tt.wbuf) {
|
||||
t.Errorf("#%d: buf = %+v want %+v", i, tt.buf, tt.wbuf)
|
||||
}
|
||||
}
|
||||
}
|
@ -1,13 +1,11 @@
|
||||
// Copyright 2014 The Go Authors. All rights reserved.
|
||||
// Use of this source code is governed by a BSD-style
|
||||
// license that can be found in the LICENSE file.
|
||||
// Copyright 2014 The Go Authors.
|
||||
// See https://code.google.com/p/go/source/browse/CONTRIBUTORS
|
||||
// Licensed under the same terms as Go itself:
|
||||
// https://code.google.com/p/go/source/browse/LICENSE
|
||||
|
||||
package http2
|
||||
|
||||
import (
|
||||
"errors"
|
||||
"fmt"
|
||||
)
|
||||
import "fmt"
|
||||
|
||||
// An ErrCode is an unsigned 32-bit error code as defined in the HTTP/2 spec.
|
||||
type ErrCode uint32
|
||||
@ -78,45 +76,3 @@ func (e StreamError) Error() string {
|
||||
type goAwayFlowError struct{}
|
||||
|
||||
func (goAwayFlowError) Error() string { return "connection exceeded flow control window size" }
|
||||
|
||||
// connErrorReason wraps a ConnectionError with an informative error about why it occurs.
|
||||
|
||||
// Errors of this type are only returned by the frame parser functions
|
||||
// and converted into ConnectionError(ErrCodeProtocol).
|
||||
type connError struct {
|
||||
Code ErrCode
|
||||
Reason string
|
||||
}
|
||||
|
||||
func (e connError) Error() string {
|
||||
return fmt.Sprintf("http2: connection error: %v: %v", e.Code, e.Reason)
|
||||
}
|
||||
|
||||
type pseudoHeaderError string
|
||||
|
||||
func (e pseudoHeaderError) Error() string {
|
||||
return fmt.Sprintf("invalid pseudo-header %q", string(e))
|
||||
}
|
||||
|
||||
type duplicatePseudoHeaderError string
|
||||
|
||||
func (e duplicatePseudoHeaderError) Error() string {
|
||||
return fmt.Sprintf("duplicate pseudo-header %q", string(e))
|
||||
}
|
||||
|
||||
type headerFieldNameError string
|
||||
|
||||
func (e headerFieldNameError) Error() string {
|
||||
return fmt.Sprintf("invalid header field name %q", string(e))
|
||||
}
|
||||
|
||||
type headerFieldValueError string
|
||||
|
||||
func (e headerFieldValueError) Error() string {
|
||||
return fmt.Sprintf("invalid header field value %q", string(e))
|
||||
}
|
||||
|
||||
var (
|
||||
errMixPseudoHeaderTypes = errors.New("mix of request and response pseudo headers")
|
||||
errPseudoAfterRegular = errors.New("pseudo header field after regular")
|
||||
)
|
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user