Verder naar navigatie Doorgaan naar hoofdinhoud Ga naar de voettekst

A Brief Review of Bitcoin Locking Scripts and Ordinals

08 juni 2023

door Nicolas Bidron

This article is an attempt at cataloging all the types of bitcoin transaction locking scripts, their prevalence and their security implications. The data presented in this article was lifted directly from the bitcoin blockchain, which required custom code to quickly iterate over the entire blockchain (over 450 GB at the time of writing). The tool is available on Github https://github.com/nccgroup/FastBTCParser.

Note: in the rest of this article, Bitcoin and Satoshi will be used interchangeably to refer to an amount of currency in a transaction (1 Bitcoin = 100,000,000 Satoshis).

Anatomy of a Bitcoin Block

Bitcoin relies on the trust-less dissemination of a ledger called the Bitcoin blockchain, which holds a record of all transactions since the inception of Bitcoin. Each block on the chain contains a transaction made of:

  • one or multiple input transactions (prior transaction outputs): all Bitcoins “spent” in a transaction, except for coinbase transactions, are one of the outputs of a prior transaction.
  • Unlocking scripts (sometimes referred as ScriptSig): each input transaction’s output needs a valid unlocking script to authorize the spending of all the proceeds of that prior transaction.
  • A list of transaction outputs: each transaction output receives an arbitrary amount from the total of input transactions proceeds, and each of these outputs needs a locking script (sometimes referred as ScriptPubKey).

Here is an example of a simple transaction data taken from the blockchain:

010000000151f36afbb502ff5dd7507845c79cb07e44edc86add4ffd068b3e7b
4017bd290b000000008a4730440220591e3186aa579cd299eb27584a8f929eac
d8b4f810ba402b80f33a153fa8f3c1022024e0c4bc4710294a56ccf9e43c2714
88139e73ae1dbcd22bf4e6c2194b489a2e014104bd117a74f353dfc60809c1c8
f7d57ddbb2bae869fb8bc3d863cb3e8ecab5af6816729494fb0687b298e67be8
75bafb5da82966394805611d0b1ef0f947025c1cffffffff02fe8ed485050000
001976a9147549ddbffcab3fbbb07c52adbe9476351c42f2b188ac001bb70000
0000001976a91465ed94fa5782ef897878140a2890babbf000853688ac000000
00
view raw tx_sample.txt hosted with ❤ by GitHub

And here is that same transaction cut into its individual fields:

01000000 ] Transaction format version number
01 ] Number of inputs
———INPUTS———-
51f36afbb502ff5dd7507845c79cb07e ┐ TXID (bytes reversed)
44edc86add4ffd068b3e7b4017bd290b ┘
00000000 ] TXID Output number
8a ] ScriptSig size (138 bytes)
4730440220591e3186aa579cd299eb27 ┐ ScriptSig
584a8f929eacd8b4f810ba402b80f33a |
153fa8f3c1022024e0c4bc4710294a56 |
ccf9e43c271488139e73ae1dbcd22bf4 |
e6c2194b489a2e014104bd117a74f353 |
dfc60809c1c8f7d57ddbb2bae869fb8b |
c3d863cb3e8ecab5af6816729494fb06 |
87b298e67be875bafb5da82966394805 |
611d0b1ef0f947025c1c ┘
ffffffff ] Sequence Number
——END OF INPUTS——
02 ] Number of outputs
———OUTPUTS———
—OUTPUT 1—
fe8ed48505000000 ] Amount in Satoshis ~237 bitcoins (bytes reversed)
19 ] ScriptPubKey size (25 bytes)
76a9147549ddbffcab3fbbb07c52adbe ┐ ScriptPubKey
9476351c42f2b188ac ┘
—OUTPUT 2—
001bb70000000000 ] Amount in Satoshis 12,000,000 (bytes reversed)
19 ] ScriptPubKey size (25 bytes)
76a91465ed94fa5782ef897878140a28 ┐ ScriptPubKey
90babbf000853688ac ┘
——END OF OUTPUTS—–
00000000 ] Lock time
  • TXID is the 32-byte ID of a prior transaction from which one of the outputs is going to be spent.
  • ScriptPubKey is considered as a locking script. Each transaction output locks an arbitrary amount of Satoshis with such a script. Those Satoshis can then be used in a future transaction by unlocking them.
  • ScriptSig is considered as an unlocking script. As such it needs to provide the adequate data and commands to satisfy the input transaction’s output locking script conditions to unlock its funds so that they can be spent in the current transaction.

Bitcoin Scripting Language

In any Bitcoin transaction, the ScriptSig and ScriptPubKey are scripts written in a simple language with a limited amount of commands. The scripting language is not Turing-complete and each command is stored in a single byte. The language provides the ability to store fixed or variable length data blocks inlined within the script, and uses a stack to process that data.

In effect ScriptSig from the current transaction and ScriptPubKey from the input (prior) transaction are concatenated (referred to as “the script”) and executed to unlock the funds. The execution is successful, funds unlocked and spent, if:

  • The script is valid
  • The script is entirely executed
  • A single non-zero value item remains on the stack

A complete description of the scripting language and commands can be found here: https://en.bitcoin.it/wiki/Script

The first output locking script or ScriptPubKey of the sample transaction above decodes to the following:

RAW Transaction ScriptPubKey:
76a9147549ddbffcab3fbbb07c52adbe9476351c42f2b188ac
Decoded op-codes:
OP_DUP OP_HASH160 OP_DATA_20 <7549ddbffcab3fbbb07c52adbe9476351c42f2b1> OP_EQUALVERIFY OP_CHECKSIG

This is a P2PKH type locking script and is described in the Pay To Public Key Hash section.

Locking scripts

Using the tool associated with this article (see here), we can now obtain a list of all existing locking script fingerprints, along with their prevalence. The fingerprinting process ignores any part of the script that’s data and replaces it with a tag (but accounts for the data’s length if it is specified by the previous script op-code).

For brevity, only scripts with over 100 occurrences will be shown below. A complete unedited list can be found through the (accompanying tool’s github page). The complete list contains 156 unique script fingerprints.

#OCCURRENCES ITEM
155 OP_1 OP_DATA_65 OP_DATA_65 OP_2 OP_CHECKMULTISIG
182 OP_IFDUP OP_IF OP_2SWAP OP_VERIFY OP_2OVER OP_DEPTH
336 OP_DATA_36
753 OP_1 OP_DATA_65 OP_1 OP_CHECKMULTISIG
986 OP_DATA_32
1693 OP_1 OP_DATA_33 OP_1 OP_CHECKMULTISIG
1749 OP_2 OP_DATA_33 OP_DATA_33 OP_DATA_33 OP_3 OP_CHECKMULTISIG
4555 OP_2 OP_DATA_33 OP_DATA_33 OP_2 OP_CHECKMULTISIG
4907 OP_1 OP_DATA_65 OP_DATA_33 OP_DATA_33 OP_3 OP_CHECKMULTISIG
8813 OP_2 OP_3 OP_DATA_75
16844 OP_1 OP_DATA_65 OP_DATA_65 OP_DATA_65 OP_3 OP_CHECKMULTISIG
31401 OP_1 OP_DATA_65 OP_DATA_33 OP_2 OP_CHECKMULTISIG
70218 OP_1 OP_DATA_33 OP_DATA_33 OP_DATA_65 OP_3 OP_CHECKMULTISIG
212896 OP_1 OP_DATA_33 OP_DATA_33 OP_2 OP_CHECKMULTISIG
219174 #note this is an empty script
289111 OP_RETURN
562934 OP_1 OP_DATA_33 OP_DATA_33 OP_DATA_33 OP_3 OP_CHECKMULTISIG
888283 OP_DATA_65 OP_CHECKSIG
2691300 OP_DATA_33 OP_CHECKSIG
17663491 OP_1 OP_DATA_32
27521302 OP_0 OP_DATA_32
52119761 OP_RETURN
306074873 OP_0 OP_DATA_20
655601631 OP_HASH160 OP_DATA_20 OP_EQUAL
1292730245 OP_DUP OP_HASH160 OP_DATA_20 OP_EQUALVERIFY OP_CHECKSIG
found 156 types of item in 2356718149 tx outputs.

All data is accurate as of May 14th 2023.

From this list of scripts we can identify 5 script types which are the most commonly used as described below.

P2PK – Pay To Public Key

Originally this was the main way to send Bitcoins from one wallet to another. These scripts have two different fingerprints:

OP_DATA_65  OP_CHECKSIG
OP_DATA_33  OP_CHECKSIG

Historically, a 64-byte public key (+1 byte to identify the type) was used in these type of transaction locking scripts. Eventually, this was replaced by a 32-byte public key as a way to optimize transaction size and thus reduce the overall transaction fees spent when using this type of locking script.

It can be observed in the diagram above, that once the shorter version was adopted, it almost entirely replaced the legacy one. The gap between the two versions of that script can most likely be explained by the rise in prevalence of P2PKH, another type of transaction that can achieve the same overall goal; wallets favoring certain types of scripts over others to either reduce transaction fees.

It is worth noting that the script security feature, the actual signature check, is the last command of the script. That OP_CHECKSIG command is what guarantees that the transaction’s outputs cannot be changed, and that funds are guaranteed to be sent where the transaction sender intended. Since blocks need to be verifiable by other miners to be considered as part of the main uninterrupted blockchain, a rogue miner attempting to change the transaction in any way would need to know the private key capable of appropriately signing the new forged transaction, so that the signature of the overall transaction would be valid and verifiable by a majority of miners.

P2PKH – Pay To Public Key Hash

P2PKH scripts achieve the same goal as P2PK locking scripts, however their prevalence in the blockchain is 2 orders of magnitude higher than P2PK scripts. They are of the following form:

OP_DUP OP_HASH160 OP_DATA_20  OP_EQUALVERIFY OP_CHECKSIG

This script locks the funds behind a hash of the public key of the payee. To unlock the funds, the ScriptSig must contain the signature by a private key of the current transaction followed by the corresponding public key, whose hash160 (RIPEMD160(SHA256(publickey))) must match the one stored in the locking script. Here the security feature of that type of script is guaranteed by the OP_HASH160, OP_EQUALVERIFY, and OP_CHECKSIG commands of the locking script. The first and second commands force a recalculation of the hash160 of the provided public key to compare it to the one stored in the locking script, and the last command enables a check of the transaction signature, which must be computed using the private key corresponding to the public key that was just verified. Once again this effectively guarantees that the miner cannot change any part of the transaction after the sender submitted it without knowing the private key.

P2MS – Pay To Multi-Signature

Despite their misleading appellation, P2MS do not necessarily send funds to multiple addresses; rather, they are locking scripts which require multiple signatures to unlock the funds. The following are the most common valid P2MS script fingerprints:

OP_1 OP_DATA_33 OP_DATA_65 OP_2 OP_CHECKMULTISIG
OP_1 OP_DATA_65 OP_DATA_65 OP_DATA_33 OP_3 OP_CHECKMULTISIG
OP_3 OP_DATA_33 OP_DATA_33 OP_DATA_33 OP_3 OP_CHECKMULTISIG
OP_2 OP_DATA_65 OP_DATA_65 OP_2 OP_CHECKMULTISIG
OP_2 OP_DATA_65 OP_DATA_65 OP_DATA_65 OP_3 OP_CHECKMULTISIG
OP_1 OP_DATA_65 OP_DATA_65 OP_2 OP_CHECKMULTISIG
OP_1 OP_DATA_65 OP_1 OP_CHECKMULTISIG
OP_1 OP_DATA_33 OP_1 OP_CHECKMULTISIG
OP_2 OP_DATA_33 OP_DATA_33 OP_DATA_33 OP_3 OP_CHECKMULTISIG
OP_2 OP_DATA_33 OP_DATA_33 OP_2 OP_CHECKMULTISIG
OP_1 OP_DATA_65 OP_DATA_33 OP_DATA_33 OP_3 OP_CHECKMULTISIG
OP_1 OP_DATA_65 OP_DATA_65 OP_DATA_65 OP_3 OP_CHECKMULTISIG
OP_1 OP_DATA_65 OP_DATA_33 OP_2 OP_CHECKMULTISIG
OP_1 OP_DATA_33 OP_DATA_33 OP_DATA_65 OP_3 OP_CHECKMULTISIG
OP_1 OP_DATA_33 OP_DATA_33 OP_2 OP_CHECKMULTISIG
OP_1 OP_DATA_33 OP_DATA_33 OP_DATA_33 OP_3 OP_CHECKMULTISIG

Depending on the number of signatures, the type of signature size, a lot of different combinations are possible. These scripts are always of the form OP_N SIGNATURE [SIGNATURE...] OP_M OP_CHECKMULTISIG where N of M signatures are required to unlock the funds. Due to limitations in the original implementation of these types of scripts, and a desire to maintain backward compatibility, the unlocking script have to follow the corresponding locking script’s specific order to provide the needed signatures. In addition, an extra command is required to prevent an off-by-one bug.

e.g.: OP_0 SIGNATURE_1 SIGNATURE_3, in the case of a multi-signature for a 2-of-3 locking script of the form OP_2 OP_DATA_33 OP_DATA_33 OP_DATA_33 OP_3 OP_CHECKMULTISIG.

The last command, OP_CHECKMULTISIG, is what guarantees that the unlocking script for the funds must sign the entire transaction with a minimum of N signatures originating from the M recorded in the original locking script. Once again, this process ensures that the miner can not change any part of the transaction without knowing each of the corresponding private keys.

Due to further design limitations in the OP_CHECKMULTISIG op-code handling for ScriptPubKey, the maximum number of signatures for both M and N is limited to 3. However, it is possible to use a different type of script (P2SH, see below) to achieve the same multi-signature feature without this limitation.

NULL Data Locking Script

While there is a standardized way to store arbitrary data in a locking script (OP_RETURN ), multiple ways of storing arbitrary data on the blockchain have been used throughout Bitcoin history. Below is a non-exhaustive list of scripts that have been used for their data storage capabilities (in some cases within a tag, and sometimes by using op-codes for their corresponding ascii values).

OP_RETURN #Standard NULL Data script
OP_0 OP_DATA_20 #Arbitrary 20 bytes of data storage
OP_DUP OP_HASH160 OP_DATA_20 OP_EQUALVERIFY OP_CHECKSIG #Script used for unintended purpose
OP_IFDUP OP_IF OP_2SWAP OP_VERIFY OP_2OVER OP_DEPTH #Use of OP-codes to spell "script" in ascii

Some, but not all, of these scripts are provably unspendable, and are effectively pruned from the record of Unspent Transaction (UTXO). Usually, they lock a null or low amount of Satoshis (lower than would be necessary to pay in transaction fees to effectively spend the locked funds), and a lot of these carry ascii data, links, or scripts.

A different way of storing data on the blockchain has recently been proposed, and put to use in a vast amount of transactions through ordinals (since the end of 2022).

P2SH – Pay To Script Hash

P2SH are the second most prevalent type of locking scripts and also the most opaque as to the actual behavior of the script. They are of the following form:

OP_HASH160 OP_DATA_20  OP_EQUAL

The P2SH locking script stores a hash (RIPEMD160(SHA256(publickey))) of the locking script in its data segment, thus only revealing the matching unlocking script when the previous transaction proceeds are to be spent. When the funds need to be unlocked, the ScriptSig will provide the actual locking script whose hash must match. This can for example be used to make an N-of-M multi-signature locking script without the 3-signatures limitation of a direct P2MS locking script.

However, this type of script carries a risk with regards to miner advantage attacks (as other types of custom locking scripts): if a locking script is revealed to not perform a signature check with a public key or hash embedded within the revealed locking script, the transaction could be hijacked to replace transaction outputs and sign with a different private key.

While this sort of attack needs to be timed between the moment the transaction is sent or propagated through the network and the moment it is actually mined, this potentially leaves a window of opportunity of several minutes up to multiple hours when the network is congested. This type of attack can also be automated, leading to fee bidding wars for the successful hijacking of improperly protected transactions.

Custom Locking Scripts

Among the 156 script fingerprints there are a few other custom scripts whose behavior can be identified through analysis of its op-codes. Some provide puzzles or challenges, such as the following:

OP_2DUP OP_ADD OP_8 OP_EQUALVERIFY OP_SUB OP_2 OP_EQUAL

which is equivalent to the following system of equations:

x+y = 8
x-y = 2

Note: this locking script above has long been redeemed (the solution to solve it being OP_5 OP_3).

Here, it should be noted that the person attempting to solve the puzzle and claim the reward locked by that script might see the transaction hijacked by a fairly simple miner advantage type of attack. Since most transactions are disseminated to the miner’s P2P network before being mined, an attacker could extract the solution to the puzzle from the transaction, and create another transaction with a different output (redirecting funds to their own wallet) and a higher transaction fee, to have their transaction processed before the original one.

This attack is possible because there are no requirements for the transaction to be signed using a private key whose corresponding public key would have been shared in the locking script. This makes such simple arithmetic puzzles and challenges difficult to secure against this type of attack.

To ensure redeeming the reward could not be hijacked, some challenge makers used to advise the challenge solver to mine the block themselves. However, this solution has now become impractical, since it would require large hashing power at the disposal of the challenge solver (mining pools are not an adequate solution either unless all miners in that pool can be trusted by the challenge redeemer).

While it appears that most of these challenges have migrated towards P2SH scripts, this does not change the security implications outlined above.

Ordinals

Ordinals represent a recent development of the Bitcoin blockchain and have been observed in an increasing amount of transactions since the end of 2022. Ordinals can be approximated as Bitcoin’s implementation of NFTs; the main difference with other implementations of NFTs (such as on the Ethereum blockchain) consists in the fact that the actual NFT item is stored directly on the blockchain, rather than only a web link to the file. Ordinals use an arbitrary data storage mechanism built on top of the blockchain’s Segregated Witnesses feature (SegWit), which is a method of providing unlocking scripts, designed with the intention of lowering transaction fees and mining power usage.

Ordinals can contain small files as part of the blockchain. These have been used to store a wide array of data ranging from gif images, webm and mp4 short videos, to ogg sound files, and various text file formats. While there is no known limitation of the type of files that can be stored, the size is limited by the SegWit format, the transaction size, and block size (4MiB).

Ordinals file types:
#OCCURRENCES ITEM
10 # Empty string used as a file type
1 <script>alert('xss in content type')</script>
1 application/epub+zip
1 application/javascript
24167 application/json
3 application/json;charset=utf-8
2 application/msword
14 application/octet-stream
221 application/pdf
3 application/pgp-signature
1 application/x-gzip
1 application/yaml
1 audio/flac
1 audio/midi
1 audio/mod
338 audio/mpeg
2 audio/ogg
4 audio/wav
1 dadabots/was+here
309 image/avif
6261 image/gif
43568 image/jpeg
7 image/jpeg;charset=utf-8
442976 image/png
35663 image/svg+xml
2 image/tiff
103823 image/webp
339 model/gltf-binary
3 model/stl
2 ordtext/plain;charset=utf-8
66 text/html
12 text/html; charset=utf-8
14467 text/html;charset=utf-8
2 text/javascript
1 text/markdown
6317 text/plain
11 text/plain; charset=utf-8
522 text/plain;charset=UTF-8
1 text/plain;charset=us-ascii
1837614 text/plain;charset=utf-8
1 text/plainn;charset=utf-
1 text/plainn;charset=utf-8
1378 video/mp4
415 video/webm
1 🟠 #UTF-16 character used as a file type
found 45 file types in 2518535 SegWit items containing ordinals.

At the top of the listing above, one can notice that a fairly innocuous Cross Site Scripting (XSS) classic payload was inserted in the file type field of an ordinal. This is likely to cause a popup, and highlight expose vulnerable webapps that scrap the blockchain for ordinals.

<script>alert('xss in content type')</script>

Another potentially mishandled file type can also be observed at the bottom of the list, in the form of a UTF-16 character.

🟠

All ordinals can be extracted using the tool associated with this article. However, this feature is provided without warranty of any kind with regards to the safety or legality of the extracted files.

Somewhat fast Bitcoin Blockchain Parser Tool

The tool that made this article possible is called FastBTCParser. It enables a somewhat fast multithreaded parsing of the Bitcoin blockchain to fingerprint and extract statistics about locking scripts, as well as to check block Merkle root validity. It also allows ordinal file extraction. The tool is freely available under a free open source software license and can be found here https://github.com/nccgroup/FastBTCParser.

Special Thanks

  • Nicolas Guigo for his help with technical advisoring.
  • Tyler Colgan for his help reviewing this article and accompanying tool.

References