This article is an attempt at cataloging all the types of bitcoin transaction locking scripts, their prevalence and their security implications. The data presented in this article was lifted directly from the bitcoin blockchain, which required custom code to quickly iterate over the entire blockchain (over 450 GB at the time of writing). The tool is available on Github https://github.com/nccgroup/FastBTCParser.
Note: in the rest of this article, Bitcoin and Satoshi will be used interchangeably to refer to an amount of currency in a transaction (1 Bitcoin = 100,000,000 Satoshis).
Anatomy of a Bitcoin Block
Bitcoin relies on the trust-less dissemination of a ledger called the Bitcoin blockchain, which holds a record of all transactions since the inception of Bitcoin. Each block on the chain contains a transaction made of:
- one or multiple input transactions (prior transaction outputs): all Bitcoins “spent” in a transaction, except for coinbase transactions, are one of the outputs of a prior transaction.
- Unlocking scripts (sometimes referred as
ScriptSig
): each input transaction’s output needs a valid unlocking script to authorize the spending of all the proceeds of that prior transaction. - A list of transaction outputs: each transaction output receives an arbitrary amount from the total of input transactions proceeds, and each of these outputs needs a locking script (sometimes referred as
ScriptPubKey
).
Here is an example of a simple transaction data taken from the blockchain:
And here is that same transaction cut into its individual fields:
TXID
is the 32-byte ID of a prior transaction from which one of the outputs is going to be spent.ScriptPubKey
is considered as a locking script. Each transaction output locks an arbitrary amount of Satoshis with such a script. Those Satoshis can then be used in a future transaction by unlocking them.ScriptSig
is considered as an unlocking script. As such it needs to provide the adequate data and commands to satisfy the input transaction’s output locking script conditions to unlock its funds so that they can be spent in the current transaction.
Bitcoin Scripting Language
In any Bitcoin transaction, the ScriptSig
and ScriptPubKey
are scripts written in a simple language with a limited amount of commands. The scripting language is not Turing-complete and each command is stored in a single byte. The language provides the ability to store fixed or variable length data blocks inlined within the script, and uses a stack to process that data.
In effect ScriptSig
from the current transaction and ScriptPubKey
from the input (prior) transaction are concatenated (referred to as “the script”) and executed to unlock the funds. The execution is successful, funds unlocked and spent, if:
- The script is valid
- The script is entirely executed
- A single non-zero value item remains on the stack
A complete description of the scripting language and commands can be found here: https://en.bitcoin.it/wiki/Script
The first output locking script or ScriptPubKey
of the sample transaction above decodes to the following:
This is a P2PKH
type locking script and is described in the Pay To Public Key Hash section.
Locking scripts
Using the tool associated with this article (see here), we can now obtain a list of all existing locking script fingerprints, along with their prevalence. The fingerprinting process ignores any part of the script that’s data and replaces it with a tag (but accounts for the data’s length if it is specified by the previous script op-code).
For brevity, only scripts with over 100 occurrences will be shown below. A complete unedited list can be found through the (accompanying tool’s github page). The complete list contains 156 unique script fingerprints.
All data is accurate as of May 14th 2023.
From this list of scripts we can identify 5 script types which are the most commonly used as described below.
P2PK – Pay To Public Key
Originally this was the main way to send Bitcoins from one wallet to another. These scripts have two different fingerprints:
OP_DATA_65 OP_CHECKSIG
OP_DATA_33 OP_CHECKSIG
Historically, a 64-byte public key (+1 byte to identify the type) was used in these type of transaction locking scripts. Eventually, this was replaced by a 32-byte public key as a way to optimize transaction size and thus reduce the overall transaction fees spent when using this type of locking script.
It can be observed in the diagram above, that once the shorter version was adopted, it almost entirely replaced the legacy one. The gap between the two versions of that script can most likely be explained by the rise in prevalence of P2PKH, another type of transaction that can achieve the same overall goal; wallets favoring certain types of scripts over others to either reduce transaction fees.
It is worth noting that the script security feature, the actual signature check, is the last command of the script. That OP_CHECKSIG
command is what guarantees that the transaction’s outputs cannot be changed, and that funds are guaranteed to be sent where the transaction sender intended. Since blocks need to be verifiable by other miners to be considered as part of the main uninterrupted blockchain, a rogue miner attempting to change the transaction in any way would need to know the private key capable of appropriately signing the new forged transaction, so that the signature of the overall transaction would be valid and verifiable by a majority of miners.
P2PKH – Pay To Public Key Hash
P2PKH
scripts achieve the same goal as P2PK
locking scripts, however their prevalence in the blockchain is 2 orders of magnitude higher than P2PK
scripts. They are of the following form:
OP_DUP OP_HASH160 OP_DATA_20 OP_EQUALVERIFY OP_CHECKSIG
This script locks the funds behind a hash of the public key of the payee. To unlock the funds, the ScriptSig
must contain the signature by a private key of the current transaction followed by the corresponding public key, whose hash160 (RIPEMD160(SHA256(publickey))
) must match the one stored in the locking script. Here the security feature of that type of script is guaranteed by the OP_HASH160
, OP_EQUALVERIFY
, and OP_CHECKSIG
commands of the locking script. The first and second commands force a recalculation of the hash160 of the provided public key to compare it to the one stored in the locking script, and the last command enables a check of the transaction signature, which must be computed using the private key corresponding to the public key that was just verified. Once again this effectively guarantees that the miner cannot change any part of the transaction after the sender submitted it without knowing the private key.
P2MS – Pay To Multi-Signature
Despite their misleading appellation, P2MS
do not necessarily send funds to multiple addresses; rather, they are locking scripts which require multiple signatures to unlock the funds. The following are the most common valid P2MS
script fingerprints:
Depending on the number of signatures, the type of signature size, a lot of different combinations are possible. These scripts are always of the form OP_N SIGNATURE [SIGNATURE...] OP_M OP_CHECKMULTISIG
where N of M signatures are required to unlock the funds. Due to limitations in the original implementation of these types of scripts, and a desire to maintain backward compatibility, the unlocking script have to follow the corresponding locking script’s specific order to provide the needed signatures. In addition, an extra command is required to prevent an off-by-one bug.
e.g.: OP_0 SIGNATURE_1 SIGNATURE_3
, in the case of a multi-signature for a 2-of-3 locking script of the form OP_2 OP_DATA_33 OP_DATA_33 OP_DATA_33 OP_3 OP_CHECKMULTISIG
.
The last command, OP_CHECKMULTISIG
, is what guarantees that the unlocking script for the funds must sign the entire transaction with a minimum of N signatures originating from the M recorded in the original locking script. Once again, this process ensures that the miner can not change any part of the transaction without knowing each of the corresponding private keys.
Due to further design limitations in the OP_CHECKMULTISIG
op-code handling for ScriptPubKey
, the maximum number of signatures for both M and N is limited to 3. However, it is possible to use a different type of script (P2SH
, see below) to achieve the same multi-signature feature without this limitation.
NULL Data Locking Script
While there is a standardized way to store arbitrary data in a locking script (OP_RETURN ), multiple ways of storing arbitrary data on the blockchain have been used throughout Bitcoin history. Below is a non-exhaustive list of scripts that have been used for their data storage capabilities (in some cases within a tag, and sometimes by using op-codes for their corresponding ascii values).
Some, but not all, of these scripts are provably unspendable, and are effectively pruned from the record of Unspent Transaction (UTXO
). Usually, they lock a null or low amount of Satoshis (lower than would be necessary to pay in transaction fees to effectively spend the locked funds), and a lot of these carry ascii data, links, or scripts.
A different way of storing data on the blockchain has recently been proposed, and put to use in a vast amount of transactions through ordinals (since the end of 2022).
P2SH – Pay To Script Hash
P2SH
are the second most prevalent type of locking scripts and also the most opaque as to the actual behavior of the script. They are of the following form:
OP_HASH160 OP_DATA_20 OP_EQUAL
The P2SH
locking script stores a hash (RIPEMD160(SHA256(publickey))
) of the locking script in its data segment, thus only revealing the matching unlocking script when the previous transaction proceeds are to be spent. When the funds need to be unlocked, the ScriptSig
will provide the actual locking script whose hash must match. This can for example be used to make an N-of-M multi-signature locking script without the 3-signatures limitation of a direct P2MS
locking script.
However, this type of script carries a risk with regards to miner advantage attacks (as other types of custom locking scripts): if a locking script is revealed to not perform a signature check with a public key or hash embedded within the revealed locking script, the transaction could be hijacked to replace transaction outputs and sign with a different private key.
While this sort of attack needs to be timed between the moment the transaction is sent or propagated through the network and the moment it is actually mined, this potentially leaves a window of opportunity of several minutes up to multiple hours when the network is congested. This type of attack can also be automated, leading to fee bidding wars for the successful hijacking of improperly protected transactions.
Custom Locking Scripts
Among the 156 script fingerprints there are a few other custom scripts whose behavior can be identified through analysis of its op-codes. Some provide puzzles or challenges, such as the following:
OP_2DUP OP_ADD OP_8 OP_EQUALVERIFY OP_SUB OP_2 OP_EQUAL
which is equivalent to the following system of equations:
x+y = 8
x-y = 2
Note: this locking script above has long been redeemed (the solution to solve it being OP_5 OP_3
).
Here, it should be noted that the person attempting to solve the puzzle and claim the reward locked by that script might see the transaction hijacked by a fairly simple miner advantage type of attack. Since most transactions are disseminated to the miner’s P2P network before being mined, an attacker could extract the solution to the puzzle from the transaction, and create another transaction with a different output (redirecting funds to their own wallet) and a higher transaction fee, to have their transaction processed before the original one.
This attack is possible because there are no requirements for the transaction to be signed using a private key whose corresponding public key would have been shared in the locking script. This makes such simple arithmetic puzzles and challenges difficult to secure against this type of attack.
To ensure redeeming the reward could not be hijacked, some challenge makers used to advise the challenge solver to mine the block themselves. However, this solution has now become impractical, since it would require large hashing power at the disposal of the challenge solver (mining pools are not an adequate solution either unless all miners in that pool can be trusted by the challenge redeemer).
While it appears that most of these challenges have migrated towards P2SH
scripts, this does not change the security implications outlined above.
Ordinals
Ordinals represent a recent development of the Bitcoin blockchain and have been observed in an increasing amount of transactions since the end of 2022. Ordinals can be approximated as Bitcoin’s implementation of NFTs; the main difference with other implementations of NFTs (such as on the Ethereum blockchain) consists in the fact that the actual NFT item is stored directly on the blockchain, rather than only a web link to the file. Ordinals use an arbitrary data storage mechanism built on top of the blockchain’s Segregated Witnesses feature (SegWit
), which is a method of providing unlocking scripts, designed with the intention of lowering transaction fees and mining power usage.
Ordinals can contain small files as part of the blockchain. These have been used to store a wide array of data ranging from gif
images, webm
and mp4
short videos, to ogg
sound files, and various text file formats. While there is no known limitation of the type of files that can be stored, the size is limited by the SegWit
format, the transaction size, and block size (4MiB).
At the top of the listing above, one can notice that a fairly innocuous Cross Site Scripting (XSS
) classic payload was inserted in the file type field of an ordinal. This is likely to cause a popup, and highlight expose vulnerable webapps that scrap the blockchain for ordinals.
<script>alert('xss in content type')</script>
Another potentially mishandled file type can also be observed at the bottom of the list, in the form of a UTF-16 character.
🟠
All ordinals can be extracted using the tool associated with this article. However, this feature is provided without warranty of any kind with regards to the safety or legality of the extracted files.
Somewhat fast Bitcoin Blockchain Parser Tool
The tool that made this article possible is called FastBTCParser. It enables a somewhat fast multithreaded parsing of the Bitcoin blockchain to fingerprint and extract statistics about locking scripts, as well as to check block Merkle root validity. It also allows ordinal file extraction. The tool is freely available under a free open source software license and can be found here https://github.com/nccgroup/FastBTCParser.
Special Thanks
- Nicolas Guigo for his help with technical advisoring.
- Tyler Colgan for his help reviewing this article and accompanying tool.