-
Notifications
You must be signed in to change notification settings - Fork 32
Allow TP to read by address prefix #34
base: main
Are you sure you want to change the base?
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,111 @@ | ||
- Feature Name: read_by_prefix | ||
- Start Date: 2018-12-19 | ||
- RFC PR: | ||
- Sawtooth Issue: | ||
|
||
# Summary | ||
[summary]: #summary | ||
|
||
This RFC proposes an additional API to transaction context, | ||
context.listAddresses(address_prefix), which will get address prefix as input | ||
and return list of all existing addresses that starts with the prefix. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we allow the case of address_prefix being empty / none? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think that better to follow logic as in get_state: if TP is requesting read of addresses that is not in 'input' address list then return failure. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks |
||
|
||
This capability exists when reading state from rest-api but is missing when | ||
transaction processor reads state. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is misleading, as it is not "missing" in the sense that we forgot to add it, but rather, it was excluded as uninteresting and not performant. Suggest "This capability exists in the Sawtooth REST API, but is not currently available via the transaction processor protocol or SDKs (for reasons covered in 'Drawbacks' below)." |
||
|
||
# Motivation | ||
[motivation]: #motivation | ||
|
||
The motivation came when looking for a way to improve processing transaction | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The optimization here is actually reduces the number of messages between the validator and a transaction processor in exchange for potentially larger messages. |
||
logic by making better use optimization that comes from using the radix tree. | ||
|
||
Since addresses are stored in a radix tree, reading many addresses by | ||
one prefix is much faster than reading the same amount of addresses one by one. | ||
Further more, read by prefix doesn't require transaction processor to know | ||
which addresses exist, removing the need to store list of existing addresses | ||
in a separate address. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. On the surface, this could be explained away by bad TP and state address scheme. It would help if this section had an illustrative example to show when we would take this approach vs. simply using a different addressing scheme. |
||
|
||
Having this capability allows a better and smarter address mapping this a | ||
better transaction processor logic. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is not a well-formed sentence. |
||
|
||
# Guide-level explanation | ||
[guide-level-explanation]: #guide-level-explanation | ||
|
||
Get state by prefix is a feature available for reading via sawtooth rest-api. | ||
This RFC adds method to expose this capability of reading address prefix | ||
to transaction processor. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This section is very light on content. Please provide concrete examples of why this is needed. Please check the Guide-Level explanation section in the template for other information that should be in this section. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @yoni-wolf you have more thoughts in here: https://jira.hyperledger.org/browse/STL-1375 like the example of deleting all leaves under a certain node. As a use case maybe that's deleting all accounts owned by a certain user? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the main usage for this feature is this:
now, it is very fast to find a certain piece of information, you just calculate it again and read that address, however, it is very hard and inefficient to clean it up! with 'read by prefix' TP can support a 'clean transaction' that will:
how would it be implemented w/o 'read by prefix'? (assuming you did implement efficient lookup like this): one way is to try and read every possible suffix (which might be a huge list and take forever), another way is to start keeping track of every suffix you used in another address, which you will have to read,modify,write for every sufix you add, which will slow you down. both are much worse i think. |
||
|
||
No change is recommended to the existing methods, so the implementation of this | ||
RFC should not cause any breakage for pre-existing transaction processors. | ||
|
||
# Reference-level explanation | ||
[reference-level-explanation]: #reference-level-explanation | ||
|
||
Example of implementation can be found here: | ||
Sawtooth-core + Python SDK changes: | ||
https://github.com/arsulegai/sawtooth-core/tree/jira_1375 | ||
(commits: 4225dc7 and 2c3a223) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please explain the actual implementation suggested for the validator and the sdks in this section as well, besides just the links to the commits. |
||
|
||
Sawtooth-sdk-cxx changes: | ||
https://github.com/arsulegai/sawtooth-sdk-cxx/tree/jira_1375 (commit: 23697e8) | ||
|
||
In the suggested implementation there are two proto changes: | ||
|
||
Addition to state_context.proto: | ||
|
||
|
||
// A request from a handler/tp for list of addresses with valid data | ||
message TpStateAddressesListRequest { | ||
// The context id that references a context in the contextmanager | ||
string context_id = 1; | ||
repeated string addresses = 2; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What if this request was for an address space with an exceptionally large number of addresses? Should they just be returned, or is there an upper limit, where the validator will send an error back with "request denied". If there should be an upper limit, how would we manage it? Another approach may be a paging approach, where the request specifies the maximum number of addresses to return (with a max page size, perhaps configurable per-validator). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same as regular read, there is no limitation of data size in one address, it is up the the TP developer to make sure he doesn't write too much data to one address. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is not the same type of problem, because read/write are relatively efficient. Returning a list of addresses from the state tree will not be efficient as it will require a large number of underlying database reads to accomplish. Adding an on-chain setting for maximum size of address storage would probably be a good feature too. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why is this a list? If it remains a list, we should specify in this section how the validator should process the request. |
||
} | ||
|
||
// A response from the contextmanager/validator with a series of State | ||
// entries having valid data | ||
message TpStateAddressesListResponse { | ||
enum Status { | ||
STATUS_UNSET = 0; | ||
OK = 1; | ||
AUTHORIZATION_ERROR = 2; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This document should describe whether this occurs based on a comparison to the request, or whether it occurs if the list would return an address outside the allowed range. It should probably be the former. |
||
} | ||
|
||
repeated string addresses = 1; | ||
Status status = 2; | ||
} | ||
|
||
Addition to validator.proto: | ||
|
||
// State list request to get all addresses having non empty data | ||
TP_STATE_LIST_REQUEST = 17; | ||
// State list response to get all addresses having non empty data | ||
TP_STATE_LIST_RESPONSE = 18; | ||
|
||
|
||
# Drawbacks | ||
[drawbacks]: #drawbacks | ||
|
||
This will need to be implemented in every sawtooth SDK and will increase | ||
the complexity of the validator slightly. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A few other drawbacks should be mentioned here: a) this will perform very poorly when the address list is large; b) this will cause large I/O and CPU on the validator when the address list is large (it is potentially very expensive, unlike the rest of the TP API); c) once implemented it will be hard/difficult to deprecate because of our commitment to backward compatibility; d) this seems like an anti-pattern in TP design, and may encourage others to implement poorly designed TPs. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't know how this is handled for the rest-api, but one salient difference between the situation for the rest-api and a transaction processor, is that only one node is impacted by the query overhead for serving the rest-api request whereas all nodes are impacted by a transaction using a similar query. If not implemented carefully, a significant performance cost could be exploited as a DOS attack on the network. |
||
|
||
# Rationale and alternatives | ||
[alternatives]: #alternatives | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Another alternative is to redesign the TP to not need the list of addresses at all. |
||
|
||
An alternative could be in the transaction processor logic, store an address | ||
that contains list of existing addresses. When need to manipulate multiple | ||
addresses transaction processor will first read the special address, then | ||
read/write/delete each address in the list. | ||
This has a performance downside that each time transaction processor needs to | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There is a performance penalty for maintaining a list like this, but it is probably substantially faster than the approach contained in this RFC. |
||
write a new address, it will need to modify the special address containing | ||
list of all addresses. This address could be huge, when the need to actually | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This argument - that the address could be huge - is also a flaw of this approach generally, since the returned address list could be that large. This should be mentioned in Drawbacks. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added to drawbacks. |
||
access all the addresses in the list can be rare. | ||
|
||
# Prior art | ||
[prior-art]: #prior-art | ||
|
||
Read state by address prefix is available from sawtooth-rest-api, and is | ||
one of the key capability that comes from using the radix tree. | ||
vaporos marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
|
||
# Unresolved questions | ||
[unresolved]: #unresolved-questions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably return an iterator so not all addresses must be obtained at once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed.