A generic ASN.1 DER/BER parser implemented in Circom for use in zero-knowledge proofs. This circuits can extract and verify key information from ASN.1 encoded data structures, enabling on-chain verification of certificates, signatures, and other ASN.1 encoded documents.
Note: This project is a work in progress and not yet recommended for production use
-
Phase-1 ASN.1 Parser
- Extract certificate information from a given PDF.
- Circom Circuit which take DER structure as input and extracting all information such as (e.g., issuer, subject, validity period, public key, signature algorithm, signature).
-
Phase-2 ZK Proof to verify data
- Develop a circuit to prove specific aspects of the extracted data.
- Whether document is signed with issuer name or not?
Some of ASN.1 DER types.
- Integers
- UTF8String
- Date and Time
- Object Identifier which used to recognize algorithm
- OIDs starting with that prefix, like 1.2.840.113549.1.1.11, which identifies sha256WithRSAEncryption
- 1.3.6.1.4.1.11129 identifies Google
- 2.5.4.6 means countryName
- 2.5.4.10 means “organizationName
- Signature
For Testing, I took two pdf i.e
- blank pdf
- blank pdf with digital signature (done with the help docuSign)
Using online tools like ASN.1 JavaScript decoder, I checked the information contained in them. Since blank.pdf doesn't have a digital signature, it doesn't contain any information to decode.
However, by checking the DocuSigned PDF and exporting the .cer
certificate in the PKCS#7/CMS DER
format, I found these information
check these link it will show the all information asn1 link
Here are some important information we can extract:
- Issuer Name (blank in DocuSigned)
- Country Information
- Locality Information
- Signature Algorithm
- Signature
Since the DocuSign document provides only basic information, we can build a generic DER or BER parser that takes data from a .cer
file and outputs all important information.
To create a new certificate signing request (CSR) using OpenSSL and generate a .cer
file:
$ openssl req -new -key private.key -out new.cer
So, I have created a new Certificate Signing Request (CSR) using OpenSSL and generate a .cer
file, you need to provide various pieces of information. This information is then stored in the DER structure of the .cer
file. The details include:
- Country Name
- State or Province Name
- Locality Name
- Organisation Name
- Organisation al Unit Name
- Common Name
- Email Address
- Date
We can later extract this information in Circom circuits.
-
Step - 1 : converting base64/Hex string as input and returns the decoded binary data as
Uint8Array
-
Step - 2 : Extracting Information from
Uint8Array
- Since most
DER
andBER
structures are encoded in base64 format, we need a base64 decoder which takes a base64 encoded string as input and givesarrayOfBytes
. - Most digital certificates are encoded with base64, and some also use hex encoding:
- PKCS#7/CMS attached signature (DER) - BASE64 Encoded
- PKCS#7/CMS attached BER - BASE64 Encoded
- PKCS#8 RSA key - Base64 Encoded
- Check whether it matches the valid regex
/^\s*(?:[0-9A-Fa-f][0-9A-Fa-f]\s*)+$/
for base64. - Here is a lookup table for the base64 standard mentioned in RFC3548.
- If we have encoded hex as “0x76696b6173”, it can be parsed into
[ 118, 105, 107, 97, 115 ]
. When we look up these values in the ASCII table, they give decoded character values. - By using this approach, for a given hex encoded string, we can get the ASCII equivalent.
- for a given base64 encoded string we can get ASCII equivalent using following circuits
- base64 Decoder Algorithm → base64-utils.js
- zkemail base64 decoder
- RFC6025
- Take any generic certificate which contains encoded data in
DER
andBER
structures:- Parse Information into Bytes:
decodeText(entire_ber_or_der_certificate)
- Parse content in
.pem
file: -----BEGIN PKCS7------{encoded_info}------END PKCS7------
- Check whether
encoded_info
is hex or base64. - If the encoding string is “hex”:
- Function:
Hex.decode(hexString)
→arrayOfBytes
- Function:
- If the entire cert is encoded in base64:
- Function:
Base64Decoder(base64string)
→arrayOfBytes
- Function:
+----------+----------+----------+--
| Type (T) | Length (L) | Value (V) |
+----------+----------+----------+--
ASN.1 encoding follows the Type-Length-Value (TLV) format, where:
- Type (T): The tag that identifies the data type.
- Length (L): The length of the value field, encoded in a compact form.
- Value (V): The actual data value, encoded according to the specific data type and encoding rules.
Every value, an octet is an eight- bit unsigned integer. Bit 8 of the octet is the most significant and bit 1 is the least significant.
Every ASN1 Tag is octet. ASN1 Tag Representation
| 7 6 | 5 | 4 3 2 1 0 |
|-----|---|-----------|
| Class | C | Number |
- Bits 7-6 (Class): Represent the tag class.
- Bit 5 (C): Indicates if the tag is constructed.
- Bits 4-0 (Number): Represent the tag number.
Here is a list of all universal class types which includes all these types.
Tag Class | Tag Number | Tag Name |
---|---|---|
Universal | 0x00 | EOC |
Universal | 0x01 | BOOLEAN |
Universal | 0x02 | INTEGER |
Universal | 0x03 | BIT_STRING |
Universal | 0x04 | OCTET_STRING |
Universal | 0x05 | NULL |
Universal | 0x06 | OBJECT_IDENTIFIER |
Universal | 0x07 | ObjectDescriptor |
Universal | 0x08 | EXTERNAL |
Universal | 0x09 | REAL |
Universal | 0x0A | ENUMERATED |
Universal | 0x0B | EMBEDDED_PDV |
Universal | 0x0C | UTF8String |
Universal | 0x0D | RELATIVE_OID |
Universal | 0x10 | SEQUENCE |
Universal | 0x11 | SET |
Universal | 0x12 | NumericString |
Universal | 0x13 | PrintableString |
Universal | 0x14 | TeletexString |
Universal | 0x15 | VideotexString |
Universal | 0x16 | IA5String |
Universal | 0x17 | UTCTime |
Universal | 0x18 | GeneralizedTime |
Universal | 0x19 | GraphicString |
Universal | 0x1A | VisibleString |
Universal | 0x1B | GeneralString |
Universal | 0x1C | UniversalString |
Universal | 0x1E | BMPString |
Since we want to extract ASN1Tag from bytesArray:
- Generally, since it follows T-L-V, the tag will be the first byte of the ASN structure.
- We need to determine other things from class, form, and number.
ASNTag Representation
// given buff to find ASN1 Tag values
const buff = 42;
// 7th and 8th bit
const tagClass = buff >> 6;
// tagClass is 00 -> universal
// 0x20 => 00100000 we will get the 6th bit
const tagConstructed = (buff & 0x20) == 0;
// 0x1f => 0011111. we will get 0-4th bits of buffer
const tagNumber = buf & 0x1f;
-
Read the Length Byte:
- The second byte in ASN.1 indicates the length.
-
Check the Most Significant Bit (MSB):
- If the MSB is 0, the byte represents the length directly (short form).
- If the MSB is 1, the byte indicates the number of subsequent bytes that encode the length (long form).
-
Short Form Encoding:
- If the MSB is 0, return the value of the byte as the length.
-
Long Form Encoding:
- If the MSB is 1, mask out the MSB to get the number of subsequent bytes.
- Read the subsequent bytes and combine them to get the length.
// Given buff to find ASN1 Tag values
const buff = 0x82;
// Check whether most significant bit is set to zero
// If it's set to 1 then it's encoded in long bytes format
const mst = buff & 0x80;
if (mst === 0) {
// Short form encoding
return buff;
} else {
// Long bytes encoding
let numBytes = buff & 0x7f; // Get 7 bits of octet 0x7F => 01111111
let length = 0;
for (let i = 2; i < numBytes; i++) {
// Read the next byte and combine to form the length
length = (length << 8) | nextByte(); // nextByte index from starting bytes
// Assume nextByte() returns the next byte in the sequence
}
return length;
}
Extraction of TLV (Type Length, Values)
const simpleASN1 = [30 ,82 ,2A ,74, ....more];
1. Decoding the Type
- The first byte
0x30
represents the Tag value. - The Tag value
0x30
corresponds to the SEQUENCE type in the universal class. This is a constructed type, meaning it can contain nested TLV triplets.
-
Decode the Length
- The second byte
0x82
has the most significant bit set to 1, indicating a long-form length encoding. - The remaining 7 bits
0x02
indicate that the Length value is encoded in the next2 bytes
.
- The second byte
-
Decode the Value
- The next 2 bytes are
0x2A, 0x74
, which represent the Length value 10,868 (0x2A74 in hexadecimal) when combined.
- The next 2 bytes are
- Since SEQUENCE indicates how many values it consists of in this constructed type, we can iterate through the next bytes, starting to check the type and extract values from it.
Let's analyze how to parse the next few bytes of the ASN.1 structure following the same approach:
- Get the first byte and find the tag type.
- Get the length of the bytes.
- Get the values.
[30,82,2A,74, 06 ,09 ,2A, 86, 48, 86, F7, 0D, 01, 07, 02, ...asn2];
|-parent asn-||-----------child asn1---------------------|--child2-|
From the previous example, we know that there are two ASN.1 structures in the stream. We can move the offset by +4 and get ASN.1 and calculate TLV values for it:
const asn1 = [06 ,09 ,2A, 86, 48, 86, F7, 0D, 01, 07, 02]
-
Determine the Type (T):
- The first byte
06
represents the Type (T) or the tag value. - This byte value
0x06
corresponds to the OBJECT_IDENTIFIER data type in the universal class.
- The first byte
-
Determine the Length (L):
- The second byte
09
represents the Length (L) of the Value field. - Since the most significant bit (0x80) is not set, this is a short-form length encoding.
- The value
0x09
(decimal 9) indicates that the length of the Value field is 9 bytes.
- The second byte
-
Determine the Value (V):
-
The remaining 9 bytes
2A 86 48 86 F7 0D 01 07 02
represent the Value (V) field for the OBJECT_IDENTIFIER data type. -
OBJECT_IDENTIFIER values are encoded using a specific set of rules:
- The value is represented as a sequence of variable-length numbers.
- The first two numbers are encoded in the first byte, and subsequent numbers are encoded in subsequent bytes.
- Each number is encoded in base 128, with the most significant bit indicating whether more bytes follow for that number.
-
Decoding the Value
2A 86 48 86 F7 0D 01 07 02
:// reference := https://luca.ntop.org/Teaching/Appunti/asn1.html function bytesToOID(bytes) { let s = ""; // Initialize an empty string to store the OID let n = 0; // Initialize a variable to accumulate the current number const len = bytes.length; // Length of the input bytes array for (let i = 0; i < len; ++i) { let v = bytes[i]; // Current byte value n = (n << 7) | (v & 0x7f); // Append the lower 7 bits to n if (!(v & 0x80)) { // If highest bit is not set if (s === "") { // If s is empty, it's the first two numbers let first = Math.floor(n / 40); // Calculate the first number let second = n % 40; // Calculate the second number s = first + "." + second; // Add the first two numbers to s } else { s += "." + n; // Add the accumulated number to s } n = 0; // Reset n for the next number } } return s; } let bytes = [0x2a, 0x86, 0x48, 0x86, 0xf7, 0x0d, 0x01, 0x07, 0x02]; console.log(bytesToOID(bytes)); // Output: 1.2.840.113549.1.7.2 let bytes2 = [0x2a, 0x86, 0x48, 0xce, 0x3d, 0x04, 0x03, 0x02]; console.log(bytesToOID(bytes2)); // Output: 1.2.840.10045.4.3.2 console.log(bytesToOID([0x55, 0x1d, 0x0e])); // Output: 2.5.29
-
To handle ASN.1 data types in circuits, i can think of two approaches:
- Individual Circuits for Specific Data Types: Write individual circuits for extracting specific data types.
-
Extract Important Data Types: extracting important data types in circuits. We need to explore ways to return these values efficiently in Circom in a single circuit.
- Important ASN.1 Data Types to Extract
- OBJECT_IDENTIFIER
- versions
- encryption algorithm used
- OCTET_STRING
- signature values
- content
- UTCTime
- UTF8String
- issuer, country, states
- BIT_STRING
- subjectPublicKey
- OBJECT_IDENTIFIER
- Important ASN.1 Data Types to Extract
Here's the TypeScript implementation of the ASN.1 parsing algorithm in ./src/parser.ts
:
function parse(data: number[]) {
let ASN_ARRAY = [];
let i = 0;
while (i < data.length - 1) {
const ASN_TAG = data[i];
const ASN_LENGTH = data[i + 1];
if (
ASN_TAG === ASN1_TAGS.SEQUENCE ||
ASN_TAG === ASN1_TAGS.SET ||
ASN_TAG == ASN1_TAGS.CONTEXT_SPECIFIC_0 ||
ASN_TAG == ASN1_TAGS.CONTEXT_SPECIFIC_1 ||
ASN_TAG == ASN1_TAGS.CONTEXT_SPECIFIC_3 ||
ASN_TAG == ASN1_TAGS.CONTEXT_SPECIFIC_4
) {
const isLongForm = (ASN_LENGTH & 0x80) === 0 ? false : true;
if (isLongForm) {
const offset = this.calculateOffSet(ASN_LENGTH);
const endIndex = i + offset + 2;
ASN_ARRAY.push(data.slice(i, endIndex));
i = endIndex;
} else {
ASN_ARRAY.push(data.slice(i, i + 2));
i += 2;
}
} else if (ASN_TAG == ASN1_TAGS.OCTET_STRING) {
const isLongForm = (ASN_LENGTH & 0x80) === 0 ? false : true;
let length = 0;
if (isLongForm) {
let numBytes = ASN_LENGTH & 0x7f;
let temp = numBytes;
let currentIndex = i + 2;
while (numBytes > 0) {
length = (length << 8) | data[currentIndex];
numBytes--;
++currentIndex;
}
const startIndex = i;
const endIndex = startIndex + length + temp + 2;
ASN_ARRAY.push(data.slice(i, endIndex));
i = endIndex;
} else {
const startIndex = i;
const endIndex = startIndex + ASN_LENGTH + 2;
ASN_ARRAY.push(data.slice(i, endIndex));
i = endIndex;
}
} else {
const startIndex = i;
const endIndex = startIndex + ASN_LENGTH + 2;
ASN_ARRAY.push(data.slice(i, endIndex));
i = endIndex;
}
}
return ASN_ARRAY;
}
const input = [
0x30, 0x82, 0x04, 0x9f, 0x06, 0x09, 0x2a, 0x86, 0x48, 0x86, 0xf7, 0x0d, 0x01, 0x07, 0x02, 0xa0, 0x82, 0x04, 0x90,
0x30, 0x82, 0x04, 0x8c, 0x02, 0x01, 0x01,
// ... (more bytes would follow in a complete certificate)
];
Now, let's walk through how the parsing algorithm would process the first 5 elements of this input:
-
30 82 04 9F
- Tag:
30
(SEQUENCE) - Length:
82 04 9F
(long form, 1183 bytes) - Algorithm:
- Recognizes
30
as SEQUENCE - Identifies long form length (0x82)
- Calculates total length (0x049F = 1183)
- Pushes
[30, 82, 04, 9F]
to ASN_ARRAY
- Recognizes
- Index moves to: 4
- Tag:
-
06 09 2A 86 48 86 F7 0D 01 07 02
- Tag:
06
(OBJECT IDENTIFIER) - Length:
09
(9 bytes) - Value:
2A 86 48 86 F7 0D 01 07 02
- Algorithm:
- Identifies
06
as OBJECT IDENTIFIER - Reads length
09
- Pushes entire line
[06, 09, 2A, 86, 48, 86, F7, 0D, 01, 07, 02]
to ASN_ARRAY
- Identifies
- Index moves to: 15
- Tag:
-
A0 82 04 90
- Tag:
A0
(CONTEXT SPECIFIC) - Length:
82 04 90
(long form, 1168 bytes) - Algorithm:
- Recognizes
A0
as CONTEXT SPECIFIC - Identifies long form length (0x82)
- Calculates total length (0x0490 = 1168)
- Pushes
[A0, 82, 04, 90]
to ASN_ARRAY
- Recognizes
- Index moves to: 19
- Tag:
-
30 82 04 8C
- Tag:
30
(SEQUENCE) - Length:
82 04 8C
(long form, 1164 bytes) - Algorithm:
- Recognizes
30
as SEQUENCE - Identifies long form length (0x82)
- Calculates total length (0x048C = 1164)
- Pushes
[30, 82, 04, 8C]
to ASN_ARRAY
- Recognizes
- Index moves to: 23
- Tag:
-
02 01 01
- Tag:
02
(INTEGER) - Length:
01
(1 byte) - Value:
01
- Algorithm:
- Identifies
02
as INTEGER - Reads length
01
- Pushes entire line
[02, 01, 01]
to ASN_ARRAY
- Identifies
- Index moves to: 26
- Tag:
After processing these 5 elements, the ASN_ARRAY would look like this:
[
[30, 82, 04, 9F],
[06, 09, 2A, 86, 48, 86, F7, 0D, 01, 07, 02],
[A0, 82, 04, 90],
[30, 82, 04, 8C],
[02, 01, 01]
]
we can look at first bytes of each array and determine its tag class and decode according to get value.
- Understanding PDF Parsing https://letsencrypt.org/docs/a-warm-welcome-to-asn1-and-der/#:~:text=The Encoding-,ASN.,to express a given structure.
- https://datatracker.ietf.org/doc/html/rfc5280#page-96
- https://lapo.it/asn1js/