Objective of this repository is to help you parse the bitcoin blockchain and convert it into a property graph model
The raw bitcoin blocks are stored in 1350 blk*.dat
files each ~128MB
. These raw files are input to our parser.
Processing is divided into 2 phases.
GCP Dataproc cluster for block-extractor with 1 hr execution time
- 1 x Master
N1-standard-2
each 2 vCPU 7.5GB RAM - 4 x Worker Nodes
N1-standard-4
each 4 vCPU 15GB RAM
GCP Kubernetes cluster for block-parser with 14 hrs execution time
- 40 x Nodes each 4 vCPU 8GB RAM
The entire blockchain can be parsed in <16Hrs. It is a scale out architecture and adding more nodes can reduce parsing time.
The spark version of the parser has a dependency on Bitcoinj which has this issue: bitcoinj fails to deserialize the new 0.13+ block format from bitcoind