Skip to content

EtherExtractor Concepts

Umesh Prabushitha Jayasinghe edited this page Aug 11, 2018 · 1 revision

Concepts

Accessing leveldb

Ethereum blockchain is saved in a leveldb database. Once connected with leveldb, you can iterate through the keys and values to see how the data really stored. Keys are present in a specific format. We need to do some byte manipulations inorder to create the required keys.

Generation of Keys

Example Byte string of the block hash key for block number 28 would be,

Block Hash Key = h\000\000\000\000\000\000\000\034n

Above byte string should be passed as the key for leveldb. If we convert the above hash key to it’s hex representation it would look like this.

Hex(Block Hash Key) = 68000000000000001c6e

WHERE 68 = hex for ‘h’, 000000000000001c = big endian 8 byte hex for 28, 6e = hex for ‘n’

Below are some keys which correspoing to the specific

  • block hash key, we can get the block hash.
  • block header key, we can get the RLP encoded header of the block.
  • block body key, we can get the RLP encoded body (transactions and ommers) of the block.

In order to get a block data, it needs block number as well as block's hash. There's a hash->number and number->hash mapping in leveldb. Leveldb values related to block header, block body, transaction etc. are encoded using Recursive Length Prefix (RLP) encoding.

This psuedo code shows how we can get data related to a block.

block_number = 28
hash_key = create_hash_key(block_number)
block_hash = leveldb_get(hash_key)

header_key = create_header_key(block_number, block_hash)
rlp_header = leveldb_get(header_key) 

body_key = create_body_key(block_number, block_hash)
rlp_body = leveldb_get(body_key)

RLP Decoding

In RLP decoded block header you’ll find 15 elements.

  1. parentHash
  2. sha3Uncles
  3. beneficiary
  4. stateRoot
  5. transactionsRoot
  6. receiptsRoot
  7. logsBloom
  8. difficulty
  9. number
  10. gasLimit
  11. gasUsed
  12. timestamp
  13. extraData
  14. mixHash
  15. nonce

In RLP decoded block body, there are 2 elements

  1. list of transactions
  2. list of ommers

In a single transaction it has 9 elements

  1. nonce
  2. gasPrice
  3. gasLimit
  4. to
  5. value
  6. init | data
  7. v
  8. r
  9. s

but you won’t find the sender’s address there. Instead you’ll find signature related values (v,r,s) which you can used to generate the address.

Generating Sender Ethereum Address from a Transaction

Ethereum is using Keccak-256 for hashing. In order to recover the public key of the sender (tx signer), Elliptic Curve Digital Signature Algorithm (ECDSA) with secp256k1’s curve is used. Here’s how we can get the sender public key, then the sender address out from a transaction.(Ref)

  1. Take v, r and s by rlp decoding the transaction
  2. Get the transaction hash
  3. Use ECDSA with secp256k1’s curve to recover the public key from tx hash and v, r, s values
  4. Take the Keccak-256 hash of the public key
  5. Take the last 40 characters / 20 bytes of this public key (Keccak-256). Or, in other words, drop the first 24 characters / 12 bytes. These 40 characters / 20 bytes are the address. When prefixed with 0x it becomes 42 characters long.