Home
Softono
b

blockchain-etl

Professional software vendor delivering innovative solutions on the Softono platform. Specialized in both open-source and proprietary software development.

Total Products
4

Software by blockchain-etl

bitcoin-etl
Open Source

bitcoin-etl

# Bitcoin ETL [![Join the chat at https://gitter.im/ethereum-eth](https://badges.gitter.im/ethereum-etl.svg)](https://gitter.im/ethereum-etl/Lobby?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge) [![Build Status](https://img.shields.io/badge/build-passing-green.svg)](https://travis-ci.com/blockchain-etl/bitcoin-etl) [![Join Telegram Group](https://img.shields.io/badge/telegram-join%20chat-blue.svg)](https://t.me/joinchat/GsMpbA3mv1OJ6YMp3T5ORQ) Install Bitcoin ETL: ```bash pip install bitcoin-etl ``` Export blocks and transactions ([Schema](#blocksjson), [Reference](#export_blocks_and_transactions)): ```bash > bitcoinetl export_blocks_and_transactions --start-block 0 --end-block 500000 \ --provider-uri http://user:pass@localhost:8332 --chain bitcoin \ --blocks-output blocks.json --transactions-output transactions.json ``` Supported chains: - bitcoin - bitcoin_cash - bitcoin_gold - dogecoin - litecoin - dash - zcash Stream blockchain data continually to console ([Reference](#stream)): ```bash > pip install bitcoin-etl[streaming] > bitcoinetl stream -p http://user:pass@localhost:8332 --start-block 500000 ``` Stream blockchain data continually to Google Pub/Sub ([Reference](#stream)): ```bash > export GOOGLE_APPLICATION_CREDENTIALS=/path_to_credentials_file.json > bitcoinetl stream -p http://user:pass@localhost:8332 --start-block 500000 --output projects/your-project/topics/crypto_bitcoin ``` For the latest version, check out the repo and call ```bash > pip install -e .[streaming] > python bitcoinetl.py ``` ## Table of Contents - [Bitcoin ETL](#bitcoin-etl) - [Table of Contents](#table-of-contents) - [Schema](#schema) - [blocks.json](#blocksjson) - [transactions.json](#transactionsjson) - [transaction_input](#transactioninput) - [transaction_output](#transactionoutput) - [Exporting the Blockchain](#exporting-the-blockchain) - [Running in Docker](#running-in-docker) - [Command Reference](#command-reference) - [export_blocks_and_transactions](#exportblocksandtransactions) - [enrich_transactions](#enrichtransactions) - [get_block_range_for_date](#getblockrangefordate) - [export_all](#exportall) - [stream](#stream) - [Running Tests](#running-tests) - [Running Tox Tests](#running-tox-tests) - [Public Datasets in BigQuery](#public-datasets-in-bigquery) ## Schema ### blocks.json Field | Type | --------------------|-----------------| hash | hex_string | size | bigint | stripped_size | bigint | weight | bigint | number | bigint | version | bigint | merkle_root | hex_string | timestamp | bigint | nonce | hex_string | bits | hex_string | coinbase_param | hex_string | transaction_count | bigint | ### transactions.json Field | Type | ------------------------|-----------------------| hash | hex_string | size | bigint | virtual_size | bigint | version | bigint | lock_time | bigint | block_number | bigint | block_hash | hex_string | block_timestamp | bigint | is_coinbase | boolean | index | bigint | inputs | []transaction_input | outputs | []transaction_output | input_count | bigint | output_count | bigint | input_value | bigint | output_value | bigint | fee | bigint | ### transaction_input Field | Type | ------------------------|-----------------------| index | bigint | spent_transaction_hash | hex_string | spent_output_index | bigint | script_asm | string | script_hex | hex_string | sequence | bigint | required_signatures | bigint | type | string | addresses | []string | value | bigint | ### transaction_output Field | Type | ------------------------|-----------------------| index | bigint | script_asm | string | script_hex | hex_string | required_signatures | bigint | type | string | addresses | []string | value | bigint | You can find column descriptions in [schemas](https://github.com/blockchain-etl/bitcoin-etl-airflow/tree/master/dags/resources/stages/enrich/schemas) **Notes**: 1. Output values returned by Dogecoin API had precision loss in the clients prior to version 1.14. It's caused by this issue https://github.com/dogecoin/dogecoin/issues/1558 The explorers that used older versions to export the data may show incorrect address balances and transaction amounts. 1. For Zcash, `vjoinsplit` and `valueBalance` fields are converted to inputs and outputs with type 'shielded' https://zcash-rpc.github.io/getrawtransaction.html, https://zcash.readthedocs.io/en/latest/rtd_pages/zips/zip-0243.html ## Exporting the Blockchain 1. Install python 3.5.3+ https://www.python.org/downloads/ 1. Install Bitcoin node https://hackernoon.com/a-complete-beginners-guide-to-installing-a-bitcoin-full-node-on-linux-2018-edition-cb8e384479ea 1. Start Bitcoin. Make sure it downloaded the blocks that you need by executing `$ bitcoin-cli getblockchaininfo` in the terminal. You can export blocks below `blocks`, there is no need to wait until the full sync 1. Install Bitcoin ETL: ```bash > pip install bitcoin-etl ``` 1. Export blocks & transactions: ```bash > bitcoinetl export_all --start 0 --end 499999 \ --partition-batch-size 100 \ --provider-uri http://user:pass@localhost:8332 --chain bitcoin ``` The result will be in the `output` subdirectory, partitioned in Hive style: ```bash output/blocks/start_block=00000000/end_block=00000099/blocks_00000000_00000099.csv output/blocks/start_block=00000100/end_block=00000199/blocks_00000100_=00000199.csv ... output/transactions/start_block=00000000/end_block=00000099/transactions_00000000_00000099.csv ... ``` In case `bitcoinetl` command is not available in PATH, use `python -m bitcoinetl` instead. ### Running in Docker 1. Install Docker https://docs.docker.com/install/ 1. Build a docker image ```bash > docker build --platform linux/x86_64 -t bitcoin-etl:latest . > docker image ls ``` 1. Run a container out of the image ```bash > docker run --platform linux/x86_64 -v $HOME/output:/bitcoin-etl/output bitcoin-etl:latest export_blocks_and_transactions --start-block 0 --end-block 500000 \ --provider-uri http://user:pass@localhost:8332 --blocks-output output/blocks.json --transactions-output output/transactions.json ``` 1. Run streaming to console or Pub/Sub ```bash > docker build --platform linux/x86_64 -t bitcoin-etl:latest-streaming -f Dockerfile_with_streaming . > echo "Stream to console" > docker run --platform linux/x86_64 bitcoin-etl:latest-streaming stream -p http://user:pass@localhost:8332 --start-block 500000 > echo "Stream to Pub/Sub" > docker run --platform linux/x86_64 -v /path_to_credentials_file/:/bitcoin-etl/ --env GOOGLE_APPLICATION_CREDENTIALS=/bitcoin-etl/credentials_file.json bitcoin-etl:latest-streaming stream -p http://user:pass@localhost:8332 --start-block 500000 --output projects/your-project/topics/crypto_bitcoin ``` 1. Refer to https://github.com/blockchain-etl/bitcoin-etl-streaming for deploying the streaming app to Google Kubernetes Engine. ### Command Reference - [export_blocks_and_transactions](#export_blocks_and_transactions) - [enrich_transactions](#enrich_transactions) - [get_block_range_for_date](#get_block_range_for_date) - [export_all](#export_all) - [stream](#stream) All the commands accept `-h` parameter for help, e.g.: ```bash > bitcoinetl export_blocks_and_transactions --help Usage: bitcoinetl.py export_blocks_and_transactions [OPTIONS] Export blocks and transactions. Options: -s, --start-block INTEGER Start block -e, --end-block INTEGER End block [required] -b, --batch-size INTEGER The number of blocks to export at a time. -p, --provider-uri TEXT The URI of the remote Bitcoin node -w, --max-workers INTEGER The maximum number of workers. --blocks-output TEXT The output file for blocks. If not provided blocks will not be exported. Use "-" for stdout --transactions-output TEXT The output file for transactions. If not provided transactions will not be exported. Use "-" for stdout --help Show this message and exit. ``` For the `--output` parameters the supported type is json. The format type is inferred from the output file name. #### export_blocks_and_transactions ```bash > bitcoinetl export_blocks_and_transactions --start-block 0 --end-block 500000 \ --provider-uri http://user:pass@localhost:8332 \ --blocks-output blocks.json --transactions-output transactions.json ``` Omit `--blocks-output` or `--transactions-output` options if you want to export only transactions/blocks. You can tune `--batch-size`, `--max-workers` for performance. Note that `required_signatures`, `type`, `addresses`, and `value` fields will be empty in transactions inputs. Use [enrich_transactions](#enrich_transactions) to populate those fields. #### enrich_transactions You need to run bitcoin daemon with option `txindex=1` for this command to work. ```bash > bitcoinetl enrich_transactions \ --provider-uri http://user:pass@localhost:8332 \ --transactions-input transactions.json --transactions-output enriched_transactions.json ``` You can tune `--batch-size`, `--max-workers` for performance. #### get_block_range_for_date ```bash > bitcoinetl get_block_range_for_date --provider-uri http://user:pass@localhost:8332 --date=2017-03-01 ``` This command is guaranteed to return the block range that covers all blocks with `block.time` on the specified date. However the returned block range may also contain blocks outside the specified date, because block times are not monotonic https://twitter.com/EvgeMedvedev/status/1073844856009576448. You can filter `blocks.json`/`transactions.json` with the below command: ```bash > bitcoinetl filter_items -i blocks.json -o blocks_filtered.json \ -p "datetime.datetime.fromtimestamp(item['timestamp']).astimezone(datetime.timezone.utc).strftime('%Y-%m-%d') == '2017-03-01'" ``` #### export_all ```bash > bitcoinetl export_all --provider-uri http://user:pass@localhost:8332 --start 2018-01-01 --end 2018-01-02 ``` You can tune `--export-batch-size`, `--max-workers` for performance. #### stream ```bash > bitcoinetl stream --provider-uri http://user:pass@localhost:8332 --start-block 500000 ``` - This command outputs blocks and transactions to the console by default. - Use `--output` option to specify the Google Pub/Sub topic where to publish blockchain data, e.g. `projects/your-project/topics/crypto_bitcoin`. Blocks and transactions will be pushed to `projects/your-project/topics/crypto_bitcoin.blocks` and `projects/your-project/topics/crypto_bitcoin.transactions` topics. - The command saves its state to `last_synced_block.txt` file where the last synced block number is saved periodically. - Specify either `--start-block` or `--last-synced-block-file` option. `--last-synced-block-file` should point to the file where the block number, from which to start streaming the blockchain data, is saved. - Use the `--lag` option to specify how many blocks to lag behind the head of the blockchain. It's the simplest way to handle chain reorganizations - they are less likely the further a block from the head. - Use the `--chain` option to specify the type of the chain, e.g. `bitcoin`, `litecoin`, `dash`, `zcash`, etc. - You can tune `--period-seconds`, `--batch-size`, `--max-workers` for performance. ### Running Tests ```bash > pip install -e .[dev] > echo "The below variables are optional" > export BITCOINETL_BITCOIN_PROVIDER_URI=http://user:pass@localhost:8332 > export BITCOINETL_LITECOIN_PROVIDER_URI=http://user:pass@localhost:8331 > export BITCOINETL_DOGECOIN_PROVIDER_URI=http://user:pass@localhost:8330 > export BITCOINETL_BITCOIN_CASH_PROVIDER_URI=http://user:pass@localhost:8329 > export BITCOINETL_DASH_PROVIDER_URI=http://user:pass@localhost:8328 > export BITCOINETL_ZCASH_PROVIDER_URI=http://user:pass@localhost:8327 > pytest -vv ``` ### Running Tox Tests ```bash > pip install tox > tox ``` ### Public Datasets in BigQuery https://cloud.google.com/blog/products/data-analytics/introducing-six-new-cryptocurrencies-in-bigquery-public-datasets-and-how-to-analyze-them

Crypto & Blockchain Data Pipelines & ETL
458 Github Stars
ethereum-etl-airflow
Open Source

ethereum-etl-airflow

# Ethereum ETL Airflow Read this article: https://cloud.google.com/blog/products/data-analytics/ethereum-bigquery-how-we-built-dataset ## Local Development Prerequisites - direnv - pyenv We are using direnv to automatically set up and load the correct python version. We also create a venv in the root folder, that is automatically activated when entering the project folder. ## Setting up Airflow DAGs using Google Cloud Composer ### Create BigQuery Datasets - Sign in to BigQuery https://bigquery.cloud.google.com/ - Create new datasets called `crypto_ethereum`, `crypto_ethereum_raw`, `crypto_ethereum_temp` ### Create Google Cloud Storage bucket - Create a new Google Storage bucket to store exported files https://console.cloud.google.com/storage/browser ### Create Google Cloud Composer (version 2) environment Create a new Cloud Composer environment: ```bash export ENVIRONMENT_NAME=ethereum-etl-0 AIRFLOW_CONFIGS_ARR=( "celery-worker_concurrency=8" "scheduler-dag_dir_list_interval=300" "scheduler-min_file_process_interval=120" ) export AIRFLOW_CONFIGS=$(IFS=, ; echo "${AIRFLOW_CONFIGS_ARR[*]}") gcloud composer environments create \ $ENVIRONMENT_NAME \ --location=us-central1 \ --image-version=composer-2.1.14-airflow-2.5.1 \ --environment-size=medium \ --scheduler-cpu=2 \ --scheduler-memory=13 \ --scheduler-storage=1 \ --scheduler-count=1 \ --web-server-cpu=1 \ --web-server-memory=2 \ --web-server-storage=512MB \ --worker-cpu=2 \ --worker-memory=13 \ --worker-storage=10 \ --min-workers=1 \ --max-workers=8 \ --airflow-configs=$AIRFLOW_CONFIGS gcloud composer environments update \ $ENVIRONMENT_NAME \ --location=us-central1 \ --update-pypi-packages-from-file=requirements_airflow.txt ``` Create variables in Airflow (**Admin > Variables** in the UI): | Variable | Description | |-----------------------------------------|-----------------------------------------| | ethereum_output_bucket | GCS bucket to store exported files | | ethereum_provider_uris | Comma separated URIs of Ethereum nodes | | ethereum_destination_dataset_project_id | Project ID of BigQuery datasets | | notification_emails | email for notifications | Check other variables in `dags/ethereumetl_airflow/variables.py`. ### Updating package requirements Suggested package requirements for Composer are stored in `requirements_airflow.txt`. You can update the Composer environment using the following script: ```bash ENVIRONMENT_NAME="ethereum-etl-0" LOCAL_REQUIREMENTS_PATH="$(mktemp)" # grep pattern removes comments and whitespace: cat "./requirements_airflow.txt" | grep -o '^[^#| ]*' > "$LOCAL_REQUIREMENTS_PATH" gcloud composer environments update \ "$ENVIRONMENT_NAME" \ --location="us-central1" \ --update-pypi-packages-from-file="$LOCAL_REQUIREMENTS_PATH" ``` **Note:** Composer can be _very_ pedantic about conflicts in additional packages. You may have to fix dependency conflicts where you had no issues testing locally (when updating dependencies, Composer does something "cleverer" than just `pip install -r requirements.txt`). This is why `eth-hash` is currently pinned in `requirements_airflow.txt`. Typically we have found that pinning `eth-hash` and/or `eth-rlp` may make things work, though Your Mileage May Vary. See [this issue](https://github.com/blockchain-etl/ethereum-etl-airflow/issues/481#issuecomment-1332878533) for further ideas on how to unblock problems you may encounter. ### Upload DAGs ```bash > ./upload_dags.sh <airflow_bucket> ``` ### Running Tests ```bash pip install \ -r requirements_test.txt \ -r requirements_local.txt \ -r requirements_airflow.txt pytest -vv -s ``` ### Running locally A docker compose definition has been provided to easily spin up a local Airflow instance. To build the required image: ```bash docker compose build ``` To start Airflow: ```bash docker compose up airflow ``` The instance requires the `CLOUDSDK_CORE_PROJECT` environment variable to be set in most cases. Airflow Variables can be defined in [variables.json](./docker/variables.json). ### Creating Table Definition Files for Parsing Events and Function Calls Read this article: https://medium.com/@medvedev1088/query-ens-and-0x-events-with-sql-in-google-bigquery-4d197206e644 ### Debugging Table Defenition Files A utility script for debugging and verifying contract parsing in Ethereum data processing pipelines is available. You can simply run ``` python3 generate_parse_sql.py <path_to_table_definition_file> <date> ``` This will output some example SQL that can be used to debug if the generated json files from the contract parser are correct. NOTE: certain files may not have the `contract_address` field specified as a valid address [ERC20Pool_event_TransferLP](dags/resources/stages/parse/table_definitions/ajna_v2/ERC20Pool_event_TransferLP.json) but use a select statement on another table instead. For these you can simply pass the contract address yourself like below: ``` python3 generate_parse_sql.py <path_to_table_definition_file> <date> --contract_address <contract_address> ``` ### More Information You can follow the instructions here for Polygon DAGs https://github.com/blockchain-etl/polygon-etl. The architecture there is very similar to Ethereum so in most case substituting `polygon` for `ethereum` will work. Contributions to this README file for porting documentation from Polygon to Ethereum are welcome.

Crypto & Blockchain Data Pipelines & ETL
441 Github Stars
public-datasets
Open Source

public-datasets

# Public Blockchain Datasets | Network | BigQuery | Lag | PubSub | Node repo(s) | Indexer repo(s) | --- | --- | --- | --- | --- | --- | Band | [πŸ”](https://console.cloud.google.com/bigquery?page=table&t=transactions&d=crypto_band&p=public-data-finance) ([Examples](https://console.cloud.google.com/marketplace/product/public-data-finance/crypto-band-dataset)) | 0 | [πŸ’§](pubsub.md) | [band-kubernetes](https://github.com/blockchain-etl/band-kubernetes) | [band-etl](https://github.com/blockchain-etl/band-etl) | Bitcoin | [πŸ”](https://console.cloud.google.com/bigquery?page=table&t=transactions&d=crypto_bitcoin&p=bigquery-public-data) ([Examples](https://console.cloud.google.com/marketplace/product/bitcoin/crypto-bitcoin)) | 3 | [πŸ’§](pubsub.md) | [docker-bitcoind](https://github.com/blockchain-etl/docker-bitcoind) | [bitcoin-etl](https://github.com/blockchain-etl/bitcoin-etl), [bitcoin-etl-airflow](https://github.com/blockchain-etl/bitcoin-etl-airflow), [bitcoin-etl-airflow-neo4j](https://github.com/blockchain-etl/bitcoin-etl-airflow-neo4j), [bitcoin-etl-streaming](https://github.com/blockchain-etl/bitcoin-etl-streaming) | Bitcoin Cash | [πŸ”](https://console.cloud.google.com/bigquery?page=table&t=transactions&d=crypto_bitcoin_cash&p=bigquery-public-data) ([Examples](https://console.cloud.google.com/marketplace/product/bitcoin-cash/crypto-bitcoin-cash))| 6 | [πŸ’§](pubsub.md) | [docker-bitcoin-cashd](https://github.com/blockchain-etl/docker-bitcoincashd) | see Bitcoin | Celo | [πŸ”](https://console.cloud.google.com/bigquery?page=table&d=crypto_celo&p=nansen-public-data&t=transactions) | 10 | [πŸ’§](pubsub.md) | ? | [celo-etl](https://github.com/nansen-ai/celo-etl) | Dash | [πŸ”](https://console.cloud.google.com/bigquery?page=table&t=transactions&d=crypto_dash&p=bigquery-public-data) ([Examples](https://console.cloud.google.com/marketplace/product/dash/crypto-dash)) | 24 | [πŸ’§](pubsub.md) | [docker-dashd](https://github.com/blockchain-etl/docker-dashd) | see Bitcoin | Dogecoin | [πŸ”](https://console.cloud.google.com/bigquery?page=table&t=transactions&d=crypto_dogecoin&p=bigquery-public-data) ([Examples](https://console.cloud.google.com/marketplace/product/dogecoin/crypto-dogecoin)) | 60 | [πŸ’§](pubsub.md) | [docker-dogecoind](https://github.com/blockchain-etl/docker-dogecoind) | see Bitcoin | Ethereum | [πŸ”](https://console.cloud.google.com/bigquery?page=table&t=transactions&d=crypto_ethereum&p=bigquery-public-data) ([Examples](https://console.cloud.google.com/marketplace/product/ethereum/crypto-ethereum-blockchain)) | 18 | [πŸ’§](pubsub.md) | ? | [ethereum-etl](https://github.com/blockchain-etl/ethereum-etl), [ethereum-etl-airflow](https://github.com/blockchain-etl/ethereum-etl-airflow), [ethereum-etl-neo4j](https://github.com/blockchain-etl/ethereum-etl-neo4j), [ethereum-etl-postgres](https://github.com/blockchain-etl/ethereum-etl-postgres) | Ethereum 2 | [πŸ”](https://console.cloud.google.com/bigquery?page=table&t=beacon_blocks&d=crypto_ethereum2&p=public-data-finance) | ? | ❌ | ? | [ethereum2-etl](https://github.com/blockchain-etl/ethereum2-etl), [ethereum2-etl-airflow](ethereum2-etl-airflow) | Ethereum Classic| [πŸ”](https://console.cloud.google.com/bigquery?page=table&t=transactions&d=crypto_ethereum_classic&p=bigquery-public-data) ([Examples](https://console.cloud.google.com/marketplace/product/ethereum-classic/crypto-ethereum-classic)) | ? | ❌ | see Ethereum | see Ethereum | Fantom | [πŸ”](https://console.cloud.google.com/bigquery?page=table&d=crypto_fantom&p=nansen-public-data&t=transactions) | ? | [πŸ’§](pubsub.md) | ? | ? | Hedera Hashgraph | [πŸ”](https://console.cloud.google.com/bigquery?page=table&t=transactions&d=mainnet&p=hedera-etl) | ? | ? | ? | [hedera-etl](https://github.com/blockchain-etl/hedera-etl) | ? | ? | IoTeX | [πŸ”](https://console.cloud.google.com/bigquery?page=table&t=transactions&d=crypto_iotex&p=public-data-finance) ([Examples](https://console.cloud.google.com/marketplace/product/public-data-finance/crypto-iotex-dataset)) | 10 | [πŸ’§](pubsub.md) | [iotex-kubernetes](https://github.com/blockchain-etl/iotex-kubernetes) | [iotex-etl](https://github.com/blockchain-etl/iotex-etl) | Litecoin | [πŸ”](https://console.cloud.google.com/bigquery?page=table&t=transactions&d=crypto_litecoin&p=bigquery-public-data) ([Examples](https://console.cloud.google.com/marketplace/product/litecoin/crypto-litecoin)) | 12 | [πŸ’§](pubsub.md) | [docker-litecoind](https://github.com/blockchain-etl/docker-litecoind) | see Bitcoin | MultiversX | [πŸ”](https://console.cloud.google.com/bigquery?page=table&t=transactions&d=crypto_multiversx_mainnet_eu&p=bigquery-public-data) ([Examples](https://console.cloud.google.com/marketplace/product/bigquery-public-data/blockchain-analytics-multiversx-mainnet-eu)) | ? | ❌ | [mx-chain-go](https://github.com/multiversx/mx-chain-go) | [multiversx-etl](https://github.com/multiversx/multiversx-etl) | Polygon | [πŸ”](https://console.cloud.google.com/bigquery?page=table&t=transactions&d=crypto_polygon&p=public-data-finance) ([Examples](https://console.cloud.google.com/marketplace/product/public-data-finance/crypto-polygon-dataset)) | 80 | [πŸ’§](pubsub.md) | ? | [polygon-etl](https://github.com/blockchain-etl/polygon-etl) | Solana | [πŸ”](https://console.cloud.google.com/bigquery?page=table&t=Transactions&d=crypto_solana_mainnet_us&p=solana-data-sandbox) ([Examples](https://console.cloud.google.com/marketplace/product/bigquery-public-data/crypto-solana-mainnet-us)) | 2-5 min | ? | [etl-rust](https://github.com/blockchain-etl/etl-rust) | [solana-etl](https://github.com/blockchain-etl/solana-etl) | Tezos | [πŸ”](https://console.cloud.google.com/bigquery?page=table&t=transactions&d=crypto_tezos&p=public-data-finance) ([Examples](https://console.cloud.google.com/marketplace/product/public-data-finance/crypto-tezos-dataset)) | ? | ? | [tezos-kubernetes](https://github.com/blockchain-etl/tezos-kubernetes) | [tezos-etl](https://github.com/blockchain-etl/tezos-etl), [tezos-etl-airflow](https://github.com/blockchain-etl/tezos-etl-airflow) | Theta | [πŸ”](https://console.cloud.google.com/bigquery?page=table&t=transactions&d=crypto_theta&p=public-data-finance) ([Examples](https://console.cloud.google.com/marketplace/product/public-data-finance/crypto-theta-dataset)) | ? | ? | ? | [theta-etl](https://github.com/blockchain-etl/theta-etl) | XRP | [πŸ”](https://console.cloud.google.com/bigquery?page=table&t=transactions&d=fullhistory&p=xrpledgerdata) | ? | ? | ? | [fetch-xrpl-transactions](https://github.com/WietseWind/fetch-xrpl-transactions) | Zcash | [πŸ”](https://console.cloud.google.com/bigquery?page=table&t=transactions&d=crypto_zcash&p=bigquery-public-data) ([Examples](https://console.cloud.google.com/marketplace/product/zcash/crypto-zcash)) | 24 | [πŸ’§](pubsub.md) | [docker-zcashd](https://github.com/blockchain-etl/docker-zcashd) | see Bitcoin | Zilliqa | [πŸ”](https://console.cloud.google.com/bigquery?page=table&t=transactions&d=crypto_zilliqa&p=public-data-finance) ([Examples](https://console.cloud.google.com/marketplace/product/public-data-finance/crypto-zilliqa-dataset)) | ? | ? | ? | [zilliqa-etl](https://github.com/blockchain-etl/zilliqa-etl) # Planning / Building | Network | BigQuery | Lag | PubSub | Node repo(s) | Indexer repo(s) | Notes | --- | --- | --- | --- | --- | --- | --- | Algorand | ⏳ | ? | ⏳ | ? | ? | Needs help | Aptos | ⏳ | ? | ⏳ | ? | ? | In progress | Arbitrum | ⏳ | ? | ⏳ | ? | ? | Needs help | Avalanche | ⏳ | ? | ⏳ | ? | ? | Needs help | BNB Chain | ⏳ | ? | ⏳ | ? | ? | Needs help | Cronos | ⏳ | ? | ⏳ | ? | ? | Needs help | Eos | ❌ | ? | ❌ | ? | [eos-etl](https://github.com/blockchain-etl/eos-etl), [eos-etl-airflow](https://github.com/blockchain-etl/eos-etl-airflow) | Needs help | Icon | ❌ | ? | ❌ | ? | [icon-etl](https://github.com/blockchain-etl/icon-etl), [icon-etl-airflow](https://github.com/blockchain-etl/icon-etl-airflow) | Needs help | Klaytn | πŸ‘· | ? | πŸ‘· | ? | [klaytn-etl](https://github.com/klaytn/klaytn-etl) | In progress | MultiversX | ⏳ | ? | ⏳ | ? | ? | In progress | Optimism | ⏳ | ? | ⏳ | ? | ? | Needs help | Sui | ⏳ | ? | ⏳ | ? | ? | In progress

Search Engines Data Warehouses
256 Github Stars
awesome-bigquery-views
Open Source

awesome-bigquery-views

# Awesome BigQuery Views Here are some examples of how to derive insights from on-chain crypto data. Not all networks have examples here - you can find the complete list of crypto datasets in [blockchain-etl/public-datasets](https://github.com/blockchain-etl/public-datasets) ## Top Ethereum Balances ```sql WITH double_entry_book AS ( -- debits SELECT to_address AS address, value AS value FROM `bigquery-public-data.crypto_ethereum.traces` WHERE to_address IS NOT NULL AND status = 1 AND (call_type NOT IN ('delegatecall', 'callcode', 'staticcall') OR call_type IS NULL) UNION ALL -- credits SELECT from_address AS address, -value AS value FROM `bigquery-public-data.crypto_ethereum.traces` WHERE from_address IS NOT NULL AND status = 1 AND (call_type NOT IN ('delegatecall', 'callcode', 'staticcall') OR call_type IS NULL) UNION ALL -- transaction fees debits SELECT miner AS address, SUM(CAST(receipt_gas_used AS numeric) * CAST((receipt_effective_gas_price - COALESCE(base_fee_per_gas, 0)) as numeric)) AS value FROM `bigquery-public-data.crypto_ethereum.transactions` AS transactions join `bigquery-public-data.crypto_ethereum.blocks` AS blocks ON blocks.number = transactions.block_number GROUP BY blocks.number, blocks.miner UNION ALL -- transaction fees credits SELECT from_address AS address, -(CAST(receipt_gas_used AS numeric) * CAST(receipt_effective_gas_price AS numeric)) AS value FROM `bigquery-public-data.crypto_ethereum.transactions` ) SELECT address, SUM(value) AS balance FROM double_entry_book GROUP BY address ORDER BY balance DESC LIMIT 1000 ``` Alternatively query `bigquery-public-data.crypto_ethereum.balances` (updated daily), e.g.: ```sql SELECT * FROM `bigquery-public-data.crypto_ethereum.balances` WHERE SEARCH(address, '0x0cfb686e114d478b055ce8614621f8bb62f70360', analyzer=>'NO_OP_ANALYZER'); ``` ## Every Ethereum Balance on Every Day ```sql WITH double_entry_book AS ( -- debits SELECT to_address AS address, value AS value, block_timestamp FROM `bigquery-public-data.crypto_ethereum.traces` WHERE to_address IS NOT NULL AND status = 1 AND (call_type NOT IN ('delegatecall', 'callcode', 'staticcall') OR call_type IS NULL) UNION ALL -- credits SELECT from_address AS address, -value AS value, block_timestamp FROM `bigquery-public-data.crypto_ethereum.traces` WHERE from_address IS NOT NULL AND status = 1 AND (call_type NOT IN ('delegatecall', 'callcode', 'staticcall') OR call_type IS NULL) UNION ALL -- transaction fees debits SELECT miner AS address, SUM(CAST(receipt_gas_used AS numeric) * CAST((receipt_effective_gas_price - COALESCE(base_fee_per_gas, 0)) AS numeric)) AS value, block_timestamp FROM `bigquery-public-data.crypto_ethereum.transactions` AS transactions JOIN `bigquery-public-data.crypto_ethereum.blocks` AS blocks ON blocks.number = transactions.block_number GROUP BY blocks.number, blocks.miner, block_timestamp UNION ALL -- transaction fees credits SELECT from_address AS address, -(CAST(receipt_gas_used AS numeric) * CAST(receipt_effective_gas_price AS numeric)) AS value, block_timestamp FROM `bigquery-public-data.crypto_ethereum.transactions` ), double_entry_book_grouped_by_date AS ( SELECT address, SUM(value) AS balance_increment, DATE(block_timestamp) AS date FROM double_entry_book GROUP BY address, date ), daily_balances_with_gaps AS ( SELECT address, date, SUM(balance_increment) OVER (PARTITION BY address ORDER BY date) AS balance, LEAD(date, 1, CURRENT_DATE()) OVER (PARTITION BY address ORDER BY date) AS next_date FROM double_entry_book_grouped_by_date ), calendar AS ( SELECT date FROM UNNEST(GENERATE_DATE_ARRAY('2015-07-30', CURRENT_DATE())) AS date ), daily_balances AS ( SELECT address, calendar.date, balance FROM daily_balances_with_gaps JOIN calendar ON daily_balances_with_gaps.date <= calendar.date AND calendar.date < daily_balances_with_gaps.next_date ) SELECT address, date, balance FROM daily_balances ``` Related article: https://medium.com/google-cloud/plotting-ethereum-address-growth-chart-55cc0e7207b2 ## Transaction Throughput Comparison ```sql WITH bitcoin_throughput AS ( -- takes transactions count in every block and divides it by average block time on that day SELECT 'bitcoin' AS chain, count(*) / (24 * 60 * 60 / count(*) OVER (PARTITION BY DATE(block_timestamp))) AS throughput, block_timestamp AS time FROM `bigquery-public-data.crypto_bitcoin.transactions` AS transactions GROUP BY transactions.block_number, transactions.block_timestamp ORDER BY throughput DESC LIMIT 1 ), bitcoin_cash_throughput AS ( SELECT 'bitcoin_cash' AS chain, count(*) / (24 * 60 * 60 / count(*) OVER (PARTITION BY DATE(block_timestamp))) AS throughput, block_timestamp AS time FROM `bigquery-public-data.crypto_bitcoin_cash.transactions` AS transactions GROUP BY transactions.block_number, transactions.block_timestamp ORDER BY throughput DESC LIMIT 1 ), ethereum_throughput AS ( SELECT 'ethereum' AS chain, count(*) / (24 * 60 * 60 / count(*) OVER (PARTITION BY DATE(block_timestamp))) AS throughput, block_timestamp AS time FROM `bigquery-public-data.crypto_ethereum.transactions` AS transactions GROUP BY transactions.block_number, transactions.block_timestamp ORDER BY throughput DESC LIMIT 1 ), ethereum_classic_throughput AS ( SELECT 'ethereum_classic' AS chain, count(*) / (24 * 60 * 60 / count(*) OVER (PARTITION BY DATE(block_timestamp))) AS throughput, block_timestamp AS time FROM `bigquery-public-data.crypto_ethereum_classic.transactions` AS transactions GROUP BY transactions.block_number, transactions.block_timestamp ORDER BY throughput DESC LIMIT 1 ), dogecoin_throughput AS ( SELECT 'dogecoin' AS chain, count(*) / (24 * 60 * 60 / count(*) OVER (PARTITION BY DATE(block_timestamp))) AS throughput, block_timestamp AS time FROM `bigquery-public-data.crypto_dogecoin.transactions` AS transactions GROUP BY transactions.block_number, transactions.block_timestamp ORDER BY throughput DESC LIMIT 1 ), litecoin_throughput AS ( SELECT 'litecoin' AS chain, count(*) / (24 * 60 * 60 / count(*) OVER (PARTITION BY DATE(block_timestamp))) AS throughput, block_timestamp AS time FROM `bigquery-public-data.crypto_litecoin.transactions` AS transactions GROUP BY transactions.block_number, transactions.block_timestamp ORDER BY throughput DESC LIMIT 1 ), dash_throughput AS ( SELECT 'dash' AS chain, count(*) / (24 * 60 * 60 / count(*) OVER (PARTITION BY DATE(block_timestamp))) AS throughput, block_timestamp AS time FROM `bigquery-public-data.crypto_dash.transactions` AS transactions GROUP BY transactions.block_number, transactions.block_timestamp ORDER BY throughput DESC LIMIT 1 ), zcash_throughput AS ( SELECT 'zcash' AS chain, count(*) / (24 * 60 * 60 / count(*) OVER (PARTITION BY DATE(block_timestamp))) AS throughput, block_timestamp AS time FROM `bigquery-public-data.crypto_zcash.transactions` AS transactions GROUP BY transactions.block_number, transactions.block_timestamp ORDER BY throughput DESC LIMIT 1 ) SELECT * FROM bitcoin_throughput UNION ALL SELECT * FROM bitcoin_cash_throughput UNION ALL SELECT * FROM ethereum_throughput UNION ALL SELECT * FROM ethereum_classic_throughput UNION ALL SELECT * FROM dogecoin_throughput UNION ALL SELECT * FROM litecoin_throughput UNION ALL SELECT * FROM dash_throughput UNION ALL SELECT * FROM zcash_throughput ORDER BY throughput DESC ``` Related article: https://medium.com/@medvedev1088/comparing-transaction-throughputs-for-8-blockchains-in-google-bigquery-with-google-data-studio-edbabb75b7f1 ## More Queries | Network | Description | Query | Screenshot | BigQuery | DataStudio | Notes | --- | --- | --- | --- | --- | --- | --- | Band | Latest oracle prices | [πŸ“](band/latest-prices.sql) | | [πŸ”](https://console.cloud.google.com/bigquery?sq=896878822558:9d41f5f621fe4deea11ed3be32ed0a5d) | | | | Band | Log types by transaction | [πŸ“](band/log-types-by-transaction.sql) | | [πŸ”](https://console.cloud.google.com/bigquery?sq=896878822558:4643d2cc218d497aa2bf4173c39cbce8) | Bitcoin | Top 1K addresses, by balance | [πŸ“](bitcoin/top-bitcoin-balances.sql) | | [πŸ”](https://console.cloud.google.com/bigquery?sq=896878822558:9bd85ce4d6174e909cfc89c09cb1cc55) | [πŸ“Š](https://datastudio.google.com/u/1/reporting/c61d1ee3-0e67-4f19-a322-4aed82a21e1b/page/p_a72nk0pzzc) | | | Bitcoin | Bitcoin Gini index, by day | [πŸ“](bitcoin/gini-index-by-day.sql) | | [πŸ”](https://console.cloud.google.com/bigquery?sq=896878822558:531f2d1edf614723b2120a839e5df04b) | [πŸ“Š](https://datastudio.google.com/u/1/reporting/c61d1ee3-0e67-4f19-a322-4aed82a21e1b/page/p_a72nk0pzzc) | [[1](https://cloud.google.com/blog/products/data-analytics/introducing-six-new-cryptocurrencies-in-bigquery-public-datasets-and-how-to-analyze-them)] | Ethereum | Every account balance on every day | [πŸ“](ethereum/every-balance-every-day.sql)| | [πŸ”](https://console.cloud.google.com/bigquery?sq=896878822558:c5323064f9fb45529ebdd65fb4091374) | [πŸ“Š](https://datastudio.google.com/u/1/reporting/c61d1ee3-0e67-4f19-a322-4aed82a21e1b/page/9tC6C) | [[1](https://medium.com/google-cloud/plotting-ethereum-address-growth-chart-55cc0e7207b2)] | Ethereum | Ether supply by day | [πŸ“](ethereum/ether-supply-by-day.sql)| [πŸ–ΌοΈ](ethereum/ether-supply-by-day.png) | [πŸ”](https://console.cloud.google.com/bigquery?sq=896878822558:7bd873dec1cd417b89552495cad09e56) | [πŸ“Š](https://datastudio.google.com/u/1/reporting/c61d1ee3-0e67-4f19-a322-4aed82a21e1b/page/9tC6C) | [[1](https://medium.com/google-cloud/how-to-query-ether-supply-in-bigquery-90f8ae795a8)] | Ethereum | Shortest path between addresses | [πŸ“](ethereum/shortest-path-via-traces.sql) | | [πŸ”](https://console.cloud.google.com/bigquery?sq=896878822558:2d202e496bf343a0aa1060f4ef35ffff) | ❌ | Zilliqa | Shortest path between addresses v2 | [πŸ“](zilliqa/shortest-path-via-traces-v2.sql) | | [πŸ”](https://console.cloud.google.com/bigquery?sq=896878822558:c4c9b9294acb42b183233b158cc67074) | ❌ Check out this awesome repository: https://github.com/RokoMijic/awesome-bigquery-views

Crypto & Blockchain Data Pipelines & ETL
211 Github Stars