Skip to content
This repository has been archived by the owner on Feb 1, 2024. It is now read-only.

Allow TP to read by address prefix #34

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
111 changes: 111 additions & 0 deletions text/0000-read-by-prefix.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
- Feature Name: read_by_prefix
- Start Date: 2018-12-19
- RFC PR:
- Sawtooth Issue:

# Summary
[summary]: #summary

This RFC proposes an additional API to transaction context,
context.listAddresses(address_prefix), which will get address prefix as input
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably return an iterator so not all addresses must be obtained at once.

Copy link
Contributor

@peterschwarz peterschwarz Jan 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

and return list of all existing addresses that starts with the prefix.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we allow the case of address_prefix being empty / none?
If so, the reference implementation lists all addresses with non-empty data at prefixes which are in input address list of transaction.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that better to follow logic as in get_state: if TP is requesting read of addresses that is not in 'input' address list then return failure.
It is up to the transaction family developer to check return value of the api and to make sure that the 'input' address list is valid, since the input list can accept address prefix i don't see this as a change of behavior.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks


This capability exists when reading state from rest-api but is missing when
transaction processor reads state.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is misleading, as it is not "missing" in the sense that we forgot to add it, but rather, it was excluded as uninteresting and not performant. Suggest "This capability exists in the Sawtooth REST API, but is not currently available via the transaction processor protocol or SDKs (for reasons covered in 'Drawbacks' below)."


# Motivation
[motivation]: #motivation

The motivation came when looking for a way to improve processing transaction
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The optimization here is actually reduces the number of messages between the validator and a transaction processor in exchange for potentially larger messages.

logic by making better use optimization that comes from using the radix tree.

Since addresses are stored in a radix tree, reading many addresses by
one prefix is much faster than reading the same amount of addresses one by one.
Further more, read by prefix doesn't require transaction processor to know
which addresses exist, removing the need to store list of existing addresses
in a separate address.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the surface, this could be explained away by bad TP and state address scheme. It would help if this section had an illustrative example to show when we would take this approach vs. simply using a different addressing scheme.


Having this capability allows a better and smarter address mapping this a
better transaction processor logic.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a well-formed sentence.


# Guide-level explanation
[guide-level-explanation]: #guide-level-explanation

Get state by prefix is a feature available for reading via sawtooth rest-api.
This RFC adds method to expose this capability of reading address prefix
to transaction processor.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section is very light on content. Please provide concrete examples of why this is needed. Please check the Guide-Level explanation section in the template for other information that should be in this section.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yoni-wolf you have more thoughts in here: https://jira.hyperledger.org/browse/STL-1375 like the example of deleting all leaves under a certain node. As a use case maybe that's deleting all accounts owned by a certain user?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the main usage for this feature is this:

  1. you allocate some prefix for a member or usage
  2. the TP logic has some algorithm for translating data to sufix
  3. TP writes the data to the calculated prefix+sufix

now, it is very fast to find a certain piece of information, you just calculate it again and read that address, however, it is very hard and inefficient to clean it up!

with 'read by prefix' TP can support a 'clean transaction' that will:

  1. use this feature to acquire a list of all used addresses under that prefix
  2. make a list of them and call the delete once, or call delete for each address there

how would it be implemented w/o 'read by prefix'? (assuming you did implement efficient lookup like this):

one way is to try and read every possible suffix (which might be a huge list and take forever), another way is to start keeping track of every suffix you used in another address, which you will have to read,modify,write for every sufix you add, which will slow you down. both are much worse i think.


No change is recommended to the existing methods, so the implementation of this
RFC should not cause any breakage for pre-existing transaction processors.

# Reference-level explanation
[reference-level-explanation]: #reference-level-explanation

Example of implementation can be found here:
Sawtooth-core + Python SDK changes:
https://github.com/arsulegai/sawtooth-core/tree/jira_1375
(commits: 4225dc7 and 2c3a223)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please explain the actual implementation suggested for the validator and the sdks in this section as well, besides just the links to the commits.


Sawtooth-sdk-cxx changes:
https://github.com/arsulegai/sawtooth-sdk-cxx/tree/jira_1375 (commit: 23697e8)

In the suggested implementation there are two proto changes:

Addition to state_context.proto:


// A request from a handler/tp for list of addresses with valid data
message TpStateAddressesListRequest {
// The context id that references a context in the contextmanager
string context_id = 1;
repeated string addresses = 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if this request was for an address space with an exceptionally large number of addresses? Should they just be returned, or is there an upper limit, where the validator will send an error back with "request denied". If there should be an upper limit, how would we manage it?

Another approach may be a paging approach, where the request specifies the maximum number of addresses to return (with a max page size, perhaps configurable per-validator).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as regular read, there is no limitation of data size in one address, it is up the the TP developer to make sure he doesn't write too much data to one address.
Since only requesting address list without addresses data this won't get too big that easily.
No objection to adding paging and max, only that it is not done in regular read address, added it as to unresolved questions section.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not the same type of problem, because read/write are relatively efficient. Returning a list of addresses from the state tree will not be efficient as it will require a large number of underlying database reads to accomplish.

Adding an on-chain setting for maximum size of address storage would probably be a good feature too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this a list? If it remains a list, we should specify in this section how the validator should process the request.

}

// A response from the contextmanager/validator with a series of State
// entries having valid data
message TpStateAddressesListResponse {
enum Status {
STATUS_UNSET = 0;
OK = 1;
AUTHORIZATION_ERROR = 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This document should describe whether this occurs based on a comparison to the request, or whether it occurs if the list would return an address outside the allowed range. It should probably be the former.

}

repeated string addresses = 1;
Status status = 2;
}

Addition to validator.proto:

// State list request to get all addresses having non empty data
TP_STATE_LIST_REQUEST = 17;
// State list response to get all addresses having non empty data
TP_STATE_LIST_RESPONSE = 18;


# Drawbacks
[drawbacks]: #drawbacks

This will need to be implemented in every sawtooth SDK and will increase
the complexity of the validator slightly.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few other drawbacks should be mentioned here: a) this will perform very poorly when the address list is large; b) this will cause large I/O and CPU on the validator when the address list is large (it is potentially very expensive, unlike the rest of the TP API); c) once implemented it will be hard/difficult to deprecate because of our commitment to backward compatibility; d) this seems like an anti-pattern in TP design, and may encourage others to implement poorly designed TPs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how this is handled for the rest-api, but one salient difference between the situation for the rest-api and a transaction processor, is that only one node is impacted by the query overhead for serving the rest-api request whereas all nodes are impacted by a transaction using a similar query. If not implemented carefully, a significant performance cost could be exploited as a DOS attack on the network.


# Rationale and alternatives
[alternatives]: #alternatives
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another alternative is to redesign the TP to not need the list of addresses at all.


An alternative could be in the transaction processor logic, store an address
that contains list of existing addresses. When need to manipulate multiple
addresses transaction processor will first read the special address, then
read/write/delete each address in the list.
This has a performance downside that each time transaction processor needs to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a performance penalty for maintaining a list like this, but it is probably substantially faster than the approach contained in this RFC.

write a new address, it will need to modify the special address containing
list of all addresses. This address could be huge, when the need to actually
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This argument - that the address could be huge - is also a flaw of this approach generally, since the returned address list could be that large. This should be mentioned in Drawbacks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added to drawbacks.
The main different is that the access to a huge set of data can be limited to only when actually need to change all the data. with 'old' approach of keeping an address that tracks existing addresses, TP needs to modify this address during almost every transaction.

access all the addresses in the list can be rare.

# Prior art
[prior-art]: #prior-art

Read state by address prefix is available from sawtooth-rest-api, and is
one of the key capability that comes from using the radix tree.
vaporos marked this conversation as resolved.
Show resolved Hide resolved


# Unresolved questions
[unresolved]: #unresolved-questions