Skip to content

Scale-out parsing for the entire bitcoin blockchain in hrs

License

Notifications You must be signed in to change notification settings

chinkitp/bitcoinparser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Parsing the Bitcoin Blockchain at Scale

Background

Objective of this repository is to help you parse the bitcoin blockchain and convert it into a property graph model

The raw bitcoin blocks are stored in 1350 blk*.dat files each ~128MB. These raw files are input to our parser.

Processing is divided into 2 phases.

Quick Setup using Google Cloud Platform (GCP)

Phase 1 : extracts each block from the *.dat file

GCP Dataproc cluster for block-extractor with 1 hr execution time

  • 1 x Master N1-standard-2 each 2 vCPU 7.5GB RAM
  • 4 x Worker Nodes N1-standard-4 each 4 vCPU 15GB RAM

Phase 2 : Prase each block and convert to a Property Graph

GCP Kubernetes cluster for block-parser with 14 hrs execution time

  • 40 x Nodes each 4 vCPU 8GB RAM

Performance

The entire blockchain can be parsed in <16Hrs. It is a scale out architecture and adding more nodes can reduce parsing time.

Why can't everything be done in spark?

The spark version of the parser has a dependency on Bitcoinj which has this issue: bitcoinj fails to deserialize the new 0.13+ block format from bitcoind

Acknowledgements

  • NBitcoin - Comprehensive Bitcoin library for the .NET framework.
  • Bitcoinj - a library for working with Bitcoin in java
  • Terry M

About

Scale-out parsing for the entire bitcoin blockchain in hrs

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published