HL7 v2.x Encoding Specification

Copy Markdown View Source

Reference document for implementing a conformant HL7 v2.x parser, encoder, MLLP transport, and ACK/NAK responder. This reference covers versions 2.3 through 2.8.2 for completeness; the library implementation targets v2.5.1 with partial tolerance for other versions. The truncation character (v2.7+) is supported. MSH-22..28 are parsed and preserved but not semantically enforced.

Sources: HL7 Standard Chapters 2 (Control) and 2C (Code Tables), versions 2.5.1 / 2.7.1 / 2.8.2; Transport Specification: MLLP Release 1 (Rene Spronk, HL7 v3 Standard, 2003); IHE ITI TF Volume 2 Appendix C.


1. Message Delimiters

HL7 v2 messages are variable-length, pipe-delimited text. Seven delimiter characters control structure:

DelimiterDefaultPositionNotes
Segment terminator<CR>Fixed (0x0D)CANNOT be redefined
Field separator|MSH-1 (byte 4)Single character; defines itself
Component separator^MSH-2, position 1Within composite data types
Repetition separator~MSH-2, position 2Between repeated field values
Escape character\MSH-2, position 3Begins/ends escape sequences
Sub-component separator&MSH-2, position 4Within components
Truncation character#MSH-2, position 5v2.7+ only; marks truncated data

1.1. Segment Terminator

Always ASCII CR (0x0D). This is hardcoded in the specification and cannot be overridden. Every segment, including the last segment in a message, ends with CR.

Implementations must normalize \r\n (CRLF) and bare \n (LF) to \r (CR) during parsing, since real-world systems often transmit CRLF despite the standard.

1.2. Field Separator (MSH-1)

A single character occupying the byte immediately after the three-character segment ID MSH. It is simultaneously the value AND the delimiter for the rest of the message.

The default is pipe (|, 0x7C). Any printable ASCII character may be used, but interoperability demands sticking with pipe.

1.3. Encoding Characters (MSH-2)

MSH-2 contains 4 characters (5 in v2.7+) in strict positional order:

Position 1: Component separator       (default ^)
Position 2: Repetition separator      (default ~)
Position 3: Escape character          (default \)
Position 4: Sub-component separator   (default &)
Position 5: Truncation character      (default #)  -- v2.7+ only

MSH-2 is parsed positionally, NOT by delimiter. The field separator does NOT precede MSH-2 in the byte stream -- MSH-2 immediately follows the MSH-1 character. See Section 2 for full MSH parsing rules.


2. MSH Segment Special Rules

The MSH segment is the only segment with non-standard parsing rules. Every other segment follows the regular SEGMENT_ID | field1 | field2 | ... pattern.

2.1. MSH Byte Layout

Byte offset:  0  1  2  3  4  5  6  7  8  9
Content:      M  S  H  |  ^  ~  \  &  |  ... (first real field, MSH-3)
              ^^^      ^  ^^^^^^^^  ^
              Seg ID   |  MSH-2     |
                       MSH-1        Next field separator

Critical parsing rules:

  1. MSH-1 is at byte offset 3. It is a SINGLE CHARACTER. It is NOT delimited -- it IS the delimiter.

  2. MSH-2 occupies bytes 4 through 7 (or 4-8 in v2.7+ with truncation char). These bytes are read positionally, not split by the field separator.

  3. The field separator at byte offset 8 (or 9 in v2.7+) begins normal field-delimited parsing from MSH-3 onward.

  4. When counting MSH fields for field numbering, MSH-1 is field 1, MSH-2 is field 2. The first pipe-delimited token after MSH-2 is MSH-3.

2.2. MSH Field Reference

FieldNameDTOptRepLength
MSH-1Field SeparatorSTSHALLN1
MSH-2Encoding CharactersSTSHALLN4..5
MSH-3Sending ApplicationHDMAYN--
MSH-4Sending FacilityHDMAYN--
MSH-5Receiving ApplicationHDMAYN--
MSH-6Receiving FacilityHDMAYY--
MSH-7Date/Time of MessageDTMSHALLN--
MSH-8SecuritySTMAYN0..40
MSH-9Message TypeMSGSHALLN--
MSH-10Message Control IDSTSHALLN1..199
MSH-11Processing IDPTSHALLN--
MSH-12Version IDVIDSHALLN--
MSH-13Sequence NumberNMMAYN--
MSH-14Continuation PointerSTMAYN0..180
MSH-15Accept Acknowledgment TypeIDMAYN2
MSH-16Application Acknowledgment TypeIDMAYN2
MSH-17Country CodeIDMAYN3
MSH-18Character SetIDMAYY5..15
MSH-19Principal LanguageCWEMAYN--
MSH-20Alt. Char Set Handling SchemeIDMAYN3..13
MSH-21Message Profile IdentifierEIMAYY--

Fields MSH-22 through MSH-28 exist in v2.7+ (responsible org, network address, security classification). See HL7 v2.8.2 Section 2.14.9 for full definitions.

2.3. MSH-9: Message Type (MSG Data Type)

Three components separated by the component separator:

<Message Code>^<Trigger Event>^<Message Structure>
ComponentTableExampleNotes
MSG-10076ADTMessage type category
MSG-20003A01Trigger event
MSG-30354ADT_A01Abstract message structure (v2.3.1+)

Examples:

  • ADT^A01^ADT_A01 -- admission
  • ORM^O01^ORM_O01 -- order
  • ORU^R01^ORU_R01 -- result
  • ACK^A01^ACK -- acknowledgment

MSG-3 may differ from MSG-1+MSG-2 when messages share a structure: ADT^A04^ADT_A01 means event A04 uses the ADT_A01 structure.

2.4. MSH-10: Message Control ID

Must uniquely identify the message. The receiver echoes it in MSA-2. Maximum 199 characters; truncation is NOT allowed. Must be unique within the scope of the sending application.

2.5. MSH-11: Processing ID (PT Data Type)

Two components:

ComponentValuesMeaning
PT-1P / D / TProduction / Debugging / Training
PT-2A / T / (empty)Archive / Not archive

Receivers must validate PT-1 matches their operational mode and reject with AR/CR if mismatched.

2.6. MSH-12: Version ID

First component is the version string: 2.3, 2.3.1, 2.4, 2.5, 2.5.1, 2.6, 2.7, 2.7.1, 2.8, 2.8.2. Receivers should reject messages with unsupported versions (AR/CR in acknowledgment).

2.7. MSH-15 and MSH-16: Acknowledgment Types

These fields control acknowledgment behavior. Both reference HL7 Table 0155:

CodeDescription
ALAlways send acknowledgment
NENever send acknowledgment
SUSend only on successful completion
ERSend only on error/reject

Rule: Either both MSH-15 and MSH-16 SHALL be populated, or both SHALL be empty. If both are empty, original mode acknowledgment applies.

2.8. MSH-18: Character Set

Repeating field specifying character encoding. If empty, 7-bit ASCII is assumed. The first repetition is the default character set; subsequent repetitions are alternates available via escape sequences.

See Section 8 for the complete Table 0211.


3. Field, Component, and Sub-Component Rules

3.1. Hierarchy

Message
  +-- Segment (terminated by CR)
        +-- Field (separated by field separator |)
              +-- Repetition (separated by repetition separator ~)
                    +-- Component (separated by component separator ^)
                          +-- Sub-component (separated by sub-component separator &)

Sub-components are the lowest level. They cannot contain further nesting.

3.2. Repetition Rules

Only fields explicitly marked as repeating (Rep=Y or Rep=N where N > 1 in the segment definition) may have multiple values separated by the repetition separator.

Components and sub-components NEVER repeat.

Example: Three phone numbers in a repeating field:
|555-1111~555-2222~555-3333|

3.3. Trailing Empty Elements

Empty trailing fields, components, and sub-components MAY be omitted. Receivers must treat them as not present (equivalent to ||).

Equivalent:
PID|1||12345^^^MRN||Doe^John
PID|1||12345^^^MRN||Doe^John||||||||||||||||||

3.4. Counting Rules for Non-MSH Segments

For all segments other than MSH, fields are numbered starting at 1 for the first field after the segment ID. The segment ID itself is not a numbered field.

PID|1||12345|...
    ^   ^
    |   PID-3
    PID-1

(PID-2 is empty between PID-1 and PID-3)

4. Null and Empty Field Handling

HL7 distinguishes three population states for every field, component, and sub-component:

4.1. Populated (Valued)

The element contains data between delimiters:

|Smith^John|

4.2. Not Populated (Empty)

No characters between consecutive delimiters. The sender has no value to communicate. The receiver makes no inference about the value in its database:

||

4.3. Explicit Null (Delete)

Two consecutive double-quote characters as the ONLY content of a field. This instructs the receiver to DELETE the previously stored value:

|""|

Critical constraint: Using "" for any purpose other than deletion is prohibited by the standard.

Delete indicator length: The deletion indicator "" is treated as having zero length and fits within any length specification.

4.4. Null at Component Level

Explicit null can apply to individual components within a composite field:

|""^John^""^Dr|

This deletes the family name and suffix while preserving given name and prefix.

4.5. Interaction with Updates

In update transactions (e.g., ADT^A08):

  • Empty field (||): no change to existing value
  • Populated field (|Smith|): replace existing value
  • Explicit null (|""|): delete existing value

5. Escape Sequences

5.1. Structure

An escape sequence begins and ends with the escape character (default \):

\<code><data>\

Where <code> is a single uppercase letter and <data> is zero or more characters.

Nesting is prohibited: No escape sequence may contain another escape sequence.

5.2. Delimiter Escape Sequences

These are valid in ALL text-bearing data types (ST, TX, FT, CF, CWE text components, etc.):

SequenceRepresentsPurpose
\F\Field separator (|)Literal pipe in data
\S\Component separator (^)Literal caret in data
\T\Sub-component separator (&)Literal ampersand in data
\R\Repetition separator (~)Literal tilde in data
\E\Escape character (\)Literal backslash in data
\P\Truncation character (#)Literal hash in data (v2.7+)

Case sensitivity: Escape codes are CASE-SENSITIVE. \F\ is valid; \f\ is not.

5.3. Formatting Escape Sequences

Valid ONLY in TX (text data), FT (formatted text), and CF (coded element with formatted text) data types:

SequenceEffect
\H\Start highlighting
\N\End highlighting (return to normal text)
\.br\Begin new output line
\.sp\End current line, skip 1 vertical space
\.sp<n>\End current line, skip n vertical spaces
\.fi\Begin word wrap / fill mode (default)
\.nf\Begin no-wrap mode
\.in<n>\Indent n spaces from left margin (+/- integer)
\.ti<n>\Temporarily indent n spaces (next line only)
\.sk<n>\Skip n spaces to the right
\.ce\End current line, center the next line

5.4. Hexadecimal Data Escape

\Xdddd...\

Where dddd... is one or more pairs of hexadecimal digits (0-9, A-F). Each pair represents one byte. Interpretation is application-specific.

Common uses:

  • \X0D\ -- carriage return byte
  • \X0A\ -- line feed byte
  • \X0D0A\ -- CRLF sequence

5.5. Character Set Escape Sequences

For fields supporting multiple character sets (PN, XPN, XCN, XON, XAD, as well as TX, FT, CF):

Single-byte character set:

\Cxxyy\

Where xx and yy are hexadecimal values identifying an ISO 2022 character set via ISO-IR registration numbers.

Multi-byte character set:

\Mxxyyzz\

Where xx, yy, and optionally zz are hexadecimal values.

Examples:

  • \C2842\ -- ISO-IR6 (ASCII)
  • \C2D41\ -- ISO-IR100 (ISO 8859-1 Latin-1)
  • \C2D46\ -- ISO-IR101 (ISO 8859-2 Latin-2)
  • \C2D48\ -- ISO-IR138 (ISO 8859-8 Hebrew)
  • \M2442\ -- ISO-IR87 (JIS X 0208 Japanese)

5.6. Locally Defined Escape Sequences

\Zdddd...\

The Z prefix is reserved for local/vendor-specific extensions. Content after Z must be valid TX characters. Interpretation requires prior agreement between sender and receiver.

5.7. Escape Sequence Rules Summary

  1. Escape sequences are case-sensitive.
  2. No nested escape sequences.
  3. All characters inside the escape (between the opening and closing \, excluding the \ delimiters themselves) count toward field length.
  4. Delimiter escape sequences (\F\, \S\, \T\, \R\, \E\, \P\) are valid in ST fields as well as TX/FT/CF.
  5. Formatting escapes (\H\, \N\, \.xx\) are valid ONLY in TX, FT, CF fields.
  6. Unrecognized escape sequences should be passed through unchanged (the standard provides no explicit guidance, but defensive implementations preserve them).

6. MLLP (Minimal Lower Layer Protocol)

Reference: Transport Specification: MLLP, Release 1 (HL7 v3 Standard, 2003). Based on HL7 Implementation Guide for HL7 v2.3.1, Appendix C "Lower Layer Protocols", Section C.4.3.

6.1. Purpose

MLLP is a minimalistic OSI session-layer framing protocol that wraps HL7 messages for transmission over TCP/IP. It relies on TCP for error detection, retransmission, and flow control.

6.2. Block Format

<SB> dddd <EB> <CR>
ElementValueASCII NameHexDescription
<SB>VTVertical Tab0x0BStart Block -- begins the frame
dddd(data)----HL7 message content (variable length)
<EB>FSFile Separator0x1CEnd Block -- marks end of message data
<CR>CRCarriage Return0x0DRequired trailing CR after End Block

Do NOT confuse:

  • SB (0x0B) with SOH (0x01) or STX (0x02)
  • EB (0x1C) with ETX (0x03) or EOT (0x04)

6.3. Data Content Rules

The data portion (dddd) can contain:

  • Any single-byte values greater than 0x1F (printable characters)
  • The ASCII carriage return character (0x0D) -- used as segment terminator within the HL7 message itself

The data portion MUST NOT contain:

  • 0x0B (Start Block) -- would be misinterpreted as a new frame start
  • 0x1C (End Block) -- would be misinterpreted as frame end
  • Byte values 0x00 through 0x1E (except 0x0D)

6.4. Character Encoding Restrictions

MLLP is framed by single-byte values. The characters within the block must be encoded using a character encoding whose byte values do not conflict with the framing bytes (0x0B, 0x1C).

Supported encodings:

  • All single-byte encodings: ASCII, ISO 8859-x, cp1252
  • UTF-8 (byte values for multi-byte sequences are always >= 0x80, so no conflict)
  • Shift_JIS

NOT supported:

  • UTF-16 (byte values may equal 0x0B or 0x1C within multi-byte sequences)
  • UTF-32 (same problem)

6.5. Connection Lifecycle

  1. Sender opens a TCP connection to the receiver's MLLP port.
  2. Sender transmits <SB> + message + <EB> + <CR>.
  3. Receiver reads until it encounters <EB><CR>, extracts the message.
  4. Receiver processes the message and generates an ACK response.
  5. Receiver transmits the ACK as <SB> + ack + <EB> + <CR>.
  6. Sender reads the ACK.
  7. Connection may remain open for additional messages (persistent) or be closed.

6.6. Multiple Messages Per Connection

The specification does not mandate one-message-per-connection. In practice:

  • Persistent connections: The sender may transmit multiple messages sequentially on the same TCP connection, waiting for each ACK before sending the next message.
  • One-shot connections: Some implementations open a new TCP connection per message.
  • Pipeline concern: The standard does not support pipelining (sending message N+1 before receiving ACK for message N). Each message-ACK pair must complete before the next begins.

6.7. Timeout Considerations

The specification does not define timeouts. Implementations should:

  • Set a TCP read timeout (commonly 30-60 seconds) when waiting for data.
  • Set a connection timeout for the initial TCP handshake.
  • Handle <SB> without a corresponding <EB><CR> within a reasonable time as a protocol error.

6.8. Error Handling

MLLP itself provides no error detection beyond frame boundary recognition. Error detection and correction are delegated to:

  • TCP (packet-level reliability)
  • HL7 ACK/NAK (application-level confirmation)

If a receiver cannot parse a valid MLLP frame, the implementation should:

  1. Log the error.
  2. Close the TCP connection (the sender will detect the closed connection).

7. ACK/NAK Acknowledgment

7.1. Mode Determination

HL7 defines two acknowledgment modes. The mode is determined by MSH-15 and MSH-16 of the INBOUND message:

MSH-15MSH-16Mode Selected
emptyemptyOriginal mode
valuedvaluedEnhanced mode
valuedemptyEnhanced mode (accept ACK only)
emptyvaluedEnhanced mode (application ACK only)

Original mode is equivalent to: MSH-15=NE, MSH-16=AL.

7.2. Acknowledgment Codes (Table 0008)

CodeModeTypeMeaning
AAOriginalApplication AcceptMessage successfully processed
AEOriginalApplication ErrorError during application processing
AROriginalApplication RejectRejected (bad type/version/processing ID, or system failure)
CAEnhancedCommit AcceptMessage safely stored for processing
CEEnhancedCommit ErrorCannot accept message (validation failure, etc.)
CREnhancedCommit RejectRejected (bad type/version/processing ID)

7.3. Original Mode Processing

The receiver performs two validation steps:

Step 1 -- Protocol validation:

  1. Is the message type (MSH-9) one the receiver accepts?
  2. Is the version (MSH-12) supported?
  3. Is the processing ID (MSH-11) appropriate?

If any check fails: return ACK with MSA-1 = AR.

Step 2 -- Application processing: The validated message is passed to the receiving application:

  • Success: return MSA-1 = AA
  • Application-level error: return MSA-1 = AE with error details in ERR segment
  • Application-level rejection: return MSA-1 = AR

7.4. Enhanced Mode Processing

Enhanced mode separates commit acknowledgment from application acknowledgment:

Phase 1 -- Commit (Accept) Acknowledgment:

Upon receiving a message, the receiver:

  1. Checks interface status and safe storage availability.
  2. Optionally validates syntax, message type, version, processing ID.
  3. If MSH-15 indicates an accept ACK is needed (AL, SU, ER):
    • CA: Message stored safely for later processing.
    • CR: Message type, version, or processing ID is invalid.
    • CE: Cannot accept for any other reason (storage failure, etc.).

Phase 2 -- Application Acknowledgment:

After processing completes (may be asynchronous), if MSH-16 indicates an application ACK is needed:

  • AA: Processing succeeded.
  • AE: Application-level error.
  • AR: Application-level rejection.

The application acknowledgment message itself CANNOT request a further application acknowledgment (its MSH-16 is always empty).

7.5. ACK Message Structure

An ACK message contains these segments:

MSH|...|ACK^<trigger_event>^ACK|<unique_control_id>|...|
MSA|<ack_code>|<original_message_control_id>|<text_message>|
[ERR|...]
[{SFT|...}]

MSH construction for ACK:

  • MSH-3/MSH-4: Swap with original MSH-5/MSH-6 (sender becomes receiver)
  • MSH-5/MSH-6: Swap with original MSH-3/MSH-4
  • MSH-9: ACK^<original_trigger_event>^ACK
  • MSH-10: A NEW unique message control ID (not the original's)
  • MSH-11: Same processing ID as original
  • MSH-12: Same version as original

7.6. MSA Segment

FieldNameDTOptDescription
MSA-1Acknowledgment CodeIDSHALLAA/AE/AR/CA/CE/CR (Table 0008)
MSA-2Message Control IDSTSHALLEchoes MSH-10 from the ORIGINAL message
MSA-3Text MessageSTMAYHuman-readable error/status text
MSA-4Expected Seq NumberNMMAYFor sequence number protocol
MSA-6Error Condition----Deprecated; use ERR segment instead

7.7. ERR Segment

Used in AE/AR/CE/CR responses to provide structured error details:

FieldNameDTOptRepDescription
ERR-1Error Code and Location------Withdrawn in v2.7; do not use
ERR-2Error LocationERLMAY0..*Segment/field/component location
ERR-3HL7 Error CodeCWESHALL1..1Table 0357 error code (see 7.8)
ERR-4SeverityIDSHALL1..1E=Error, W=Warning, I=Information
ERR-5Application Error CodeCWEMAY0..1Application-specific error
ERR-6Application Error ParamSTMAY0..10Parameter substitution values (max 80 chars)
ERR-7Diagnostic InformationTXMAY0..1Technical details (max 2048 chars)
ERR-8User MessageTXMAY0..1User-facing text (max 250 chars)
ERR-9Inform Person IndicatorCWEMAY0..*Who should be notified
ERR-10Override TypeCWEMAY0..1Whether override is permitted
ERR-11Override Reason CodeCWEMAY0..*Justification for override
ERR-12Help Desk ContactXTNMAY0..*Support contact information

7.8. Table 0357 -- Message Error Condition Codes

CodeDescriptionCategory
0Message acceptedSuccess
100Segment sequence errorSyntax error
101Required field missingSyntax error
102Data type errorSyntax error
103Table value not foundSemantic error
104Value too longLength error
200Unsupported message typeRejection
201Unsupported event codeRejection
202Unsupported processing IDRejection
203Unsupported version IDRejection
204Unknown key identifierBusiness error
205Duplicate key identifierBusiness error
206Application record lockedSystem error
207Application internal errorSystem error

7.9. When to Use Each Acknowledgment Code

AA (Application Accept):

  • Message parsed, validated, and processed successfully.
  • Receiver has committed the data changes.

AE (Application Error):

  • Message parsed and understood, but business logic failed.
  • The sender may correct the data and resend.
  • Include ERR segment with specific error details.
  • ERR-4 = "E" (Error) or "W" (Warning).

AR (Application Reject):

  • Message cannot be processed for systemic reasons.
  • Bad message type/version/processing ID, or system is down.
  • May be transient (system overload) -- sender may retry unchanged.
  • For enhanced mode, use CR instead when type/version/ID is the cause.

CA (Commit Accept):

  • Enhanced mode only. Message received and safely stored.
  • Does NOT mean the message has been processed.
  • Application processing will happen asynchronously.

CE (Commit Error):

  • Enhanced mode only. Cannot accept the message.
  • Storage failure, validation error, or other infrastructure issue.

CR (Commit Reject):

  • Enhanced mode only. Message type, version, or processing ID is unacceptable.
  • Sender should not retry without fixing the cause.

8. Character Sets and Encoding

8.1. Default Character Set

If MSH-18 is empty, the message uses 7-bit ASCII (decimal 0-127, hex 0x00-0x7F).

8.2. Table 0211 -- Alternate Character Sets

CodeCharacter SetType
ASCII7-bit ASCIISingle-byte
8859/1ISO 8859-1 (Latin-1)Single-byte
8859/2ISO 8859-2 (Latin-2)Single-byte
8859/3ISO 8859-3 (Latin-3)Single-byte
8859/4ISO 8859-4 (Latin-4)Single-byte
8859/5ISO 8859-5 (Cyrillic)Single-byte
8859/6ISO 8859-6 (Arabic)Single-byte
8859/7ISO 8859-7 (Greek)Single-byte
8859/8ISO 8859-8 (Hebrew)Single-byte
8859/9ISO 8859-9 (Latin-5/Turkish)Single-byte
8859/15ISO 8859-15 (Latin-9)Single-byte
ISO IR6ISO 646 (ASCII graphic subset)Single-byte
ISO IR14JIS X 0201-1976Single-byte
ISO IR87JIS X 0208-1990 (Kanji)Multi-byte
ISO IR159JIS X 0212-1990 (Supplementary)Multi-byte
KS X 1001Korean (EUC-KR)Multi-byte
CNS 11643-1992TaiwaneseMulti-byte
BIG-5Taiwanese (Big5)Multi-byte
GB 18030-2000Chinese (GB18030)Multi-byte
UNICODEISO 10646 UCS-2 (deprecated)Multi-byte
UNICODE UTF-8UTF-8Variable
UNICODE UTF-16UTF-16 (removed from standard)Multi-byte
UNICODE UTF-32UTF-32 (removed from standard)Multi-byte

8.3. MSH-18 Semantics

  • If MSH-18 is blank: ASCII is the character set for the entire message.
  • If MSH-18 has one value: that is the character set for the entire message.
  • If MSH-18 has multiple repetitions: the first is the default; subsequent repetitions are alternates available via \Cxxyy\ and \Mxxyyzz\ escape sequences.

8.4. MSH-20: Alternate Character Set Handling Scheme

Controls how character set switching works when MSH-18 has alternates:

CodeDescription
ISO 2022-1994ISO 2022 escape sequences select character sets
2.3HL7 v2.3-compatible handling (deprecated)
(empty)Implementation-defined

8.5. UTF-8 in Practice

Most modern implementations use UNICODE UTF-8 in MSH-18 exclusively. UTF-8 properties relevant to HL7:

  1. ASCII-compatible: bytes 0x00-0x7F are identical to ASCII. All HL7 delimiters (pipe, caret, tilde, backslash, ampersand, CR) fall in this range. Therefore, a parser can safely split on delimiters at the byte level without worrying about multi-byte character boundaries.

  2. MLLP-safe: UTF-8 multi-byte sequences use bytes 0x80-0xFF, which do not conflict with MLLP framing bytes (0x0B, 0x1C, 0x0D).

  3. A message encoded in 7-bit ASCII can be submitted to a UTF-8 destination with no modification.


9. Message Construction Rules

9.1. Encoding Procedure (Pseudocode)

function encode_message(segments) -> binary:
    for each segment in segment_order:
        emit segment_id                              # e.g., "MSH", "PID"

        if segment_id == "MSH":
            emit field_separator                     # MSH-1 (e.g., |)
            emit encoding_characters                 # MSH-2 (e.g., ^~\&)
            start_field_index = 3                    # Skip to MSH-3
        else:
            start_field_index = 1

        for each field from start_field_index:
            emit field_separator

            for each repetition in field:
                if not first_repetition:
                    emit repetition_separator

                for each component in repetition:
                    if not first_component:
                        emit component_separator

                    for each sub_component in component:
                        if not first_sub_component:
                            emit sub_component_separator

                        emit escape(sub_component_value)

            # Stop emitting fields after the last populated field
            # (trailing empty fields may be omitted)

        emit CR                                      # Segment terminator

9.2. Escape Procedure

When encoding field values, replace delimiter characters with escape sequences:

function escape(value) -> string:
    replace field_separator       with \F\
    replace component_separator   with \S\
    replace sub_component_sep     with \T\
    replace repetition_separator  with \R\
    replace escape_character      with \E\
    replace truncation_character  with \P\   # v2.7+ only
    return value

Order matters: The escape character itself MUST be escaped FIRST, before replacing other delimiters. Otherwise, the backslashes introduced by other replacements would themselves get escaped.

9.3. Decoding Procedure

When parsing received messages:

  1. Split on CR to get segments.
  2. For the first segment (MSH), extract MSH-1 (byte 4) and MSH-2 (bytes 5-8 or 5-9).
  3. For remaining fields in MSH and all fields in other segments, split by field separator.
  4. For each field, split by repetition separator.
  5. For each repetition, split by component separator.
  6. For each component, split by sub-component separator.
  7. Apply unescape to each terminal value.
function unescape(value) -> string:
    replace \F\ with field_separator
    replace \S\ with component_separator
    replace \T\ with sub_component_separator
    replace \R\ with repetition_separator
    replace \E\ with escape_character
    replace \P\ with truncation_character    # v2.7+ only
    # Process \Xdd\ hex escapes
    # Process \Cxxyy\ and \Mxxyyzz\ character set escapes
    # Preserve unrecognized \Z...\ and other unknown escapes
    return value

9.4. Recipient Processing Rules

Receivers SHALL:

  1. Ignore unexpected elements: Segments, fields, components, sub-components, and extra repetitions not defined in the expected message structure must be silently ignored. This enables forward compatibility.

  2. Treat absent optional segments as consisting entirely of empty fields.

  3. Treat absent trailing fields and components as not present (empty).

This means a parser must NOT fail when encountering unknown segments or extra fields beyond what it expects.


10. Truncation (v2.7+)

10.1. Truncation Indicator

When a field value exceeds the conformance length and the field is marked as truncatable (# in the conformance length column):

  1. Truncate at position N-1 (where N is the maximum length).
  2. Place the truncation character (default #) as the last character.
Example: Maximum length 6, value "abcdefgh"
Result:  "abcde#"

10.2. Edge Case: Data Ends With Truncation Character

If the original data's last character is the truncation character itself, escape it:

Example: Maximum length 6, value "abcde#"
Result:  "abcde\P\"

10.3. Fields Without Truncation Support

Fields marked with = in the conformance length column do NOT support truncation. Values exceeding the limit must be rejected (the sender must fix the data).


11. Segment Groups and Message Structure

11.1. Segment Groups

Segments may be organized into logical groups (enclosed in brackets [...] for optional, braces {...} for repeating in the abstract message definition):

ADT_A01:
    MSH
    [{SFT}]
    [UAC]
    EVN
    PID
    [PD1]
    [{ROL}]
    [{NK1}]
    PV1
    [PV2]
    [{ROL}]
    [{DB1}]
    [{OBX}]
    ...

11.2. Segment Ordering

Segments must appear in the order specified by the abstract message definition. Out-of- order segments should be treated as unexpected and ignored per Section 9.4.

11.3. Ambiguity Prevention

If a segment name appears in multiple group positions (at least one optional or repeating), the appearances must be separated by at least one required segment of a different name. This prevents parsing ambiguity.


12. Complete Example

12.1. ADT^A08 (Patient Update) with MLLP Framing

<0x0B>MSH|^~\&|HIS|HOSPITAL|PHAOS|ARCHIVE|20260322143000||ADT^A08^ADT_A01|MSG00001|P|2.5.1|||AL|NE<CR>EVN|A08|20260322143000<CR>PID|||12345^^^HOSP^MR||Smith^John^M||19800115|M|||123 Main St^^Springfield^IL^62701||555-1234<CR>PV1||I|ICU^301^A|<CR><0x1C><0x0D>

12.2. ACK Response

<0x0B>MSH|^~\&|PHAOS|ARCHIVE|HIS|HOSPITAL|20260322143001||ACK^A08^ACK|ACK_MSG00001|P|2.5.1<CR>MSA|AA|MSG00001<CR><0x1C><0x0D>

12.3. NAK Response with ERR Segment

<0x0B>MSH|^~\&|PHAOS|ARCHIVE|HIS|HOSPITAL|20260322143001||ACK^A08^ACK|ACK_MSG00001|P|2.5.1<CR>MSA|AE|MSG00001|Patient not found<CR>ERR||PID^1^3|204^Unknown key identifier^HL70357|E|||Patient ID 12345 not found in registry<CR><0x1C><0x0D>

12.4. Escape Sequence Examples

# Field containing a literal pipe character:
OBX|1|ST|1234||Blood pressure: 120\F\80 mmHg||

# Field containing a literal caret:
OBX|2|ST|5678||Grade: A\S\B (combined)||

# Field containing a literal backslash:
OBX|3|ST|9012||Path: C:\E\Users\E\Data||

# Formatted text with line breaks:
OBX|4|FT|3456||Line 1\.br\Line 2\.br\Line 3||

# Hex-encoded binary data:
OBX|5|ST|7890||\X48454C4C4F\||  (encodes "HELLO")

13. Implementation Checklist

13.1. Parser Must Handle

  • [ ] MSH-1 and MSH-2 non-standard positional parsing
  • [ ] Custom delimiters (even if always |^~\& in practice)
  • [ ] Segment splitting on CR (normalize CRLF and LF first)
  • [ ] Field splitting by field separator
  • [ ] Repetition splitting by repetition separator
  • [ ] Component splitting by component separator
  • [ ] Sub-component splitting by sub-component separator
  • [ ] Trailing empty fields/components (treat as absent)
  • [ ] Explicit null ("") recognition
  • [ ] All delimiter escape sequences (\F\, \S\, \T\, \R\, \E\, \P\)
  • [ ] Hex escape sequences (\Xdd...\)
  • [ ] Formatting escape sequences for FT fields (\.br\, \H\, \N\, etc.)
  • [ ] Character set escapes (\Cxxyy\, \Mxxyyzz\)
  • [ ] Unknown escape sequences (preserve, do not error)
  • [ ] Forward compatibility (ignore unknown segments and extra fields)

13.2. Encoder Must Handle

  • [ ] MSH-1/MSH-2 special construction
  • [ ] Escape delimiters in field values (escape \ first)
  • [ ] Omit trailing empty fields
  • [ ] Segment terminator (CR only, not CRLF)
  • [ ] Unique MSH-10 generation
  • [ ] Timestamp formatting (DTM: YYYYMMDDHHMMSS[.S[S[S[S]]]][+/-ZZZZ])

13.3. MLLP Must Handle

  • [ ] Frame wrapping: <0x0B> + message + <0x1C> + <0x0D>
  • [ ] Frame extraction: buffer until <0x1C><0x0D> sequence found
  • [ ] Multiple messages per connection (sequential, not pipelined)
  • [ ] Read timeout handling
  • [ ] Graceful connection closure
  • [ ] Reject data with embedded 0x0B or 0x1C within message body

13.4. ACK Builder Must Handle

  • [ ] Mode determination from MSH-15/MSH-16
  • [ ] Correct sender/receiver field swapping
  • [ ] MSA-2 echoing original MSH-10
  • [ ] Unique MSH-10 for the ACK itself
  • [ ] ERR segment construction with Table 0357 codes
  • [ ] ERR-4 severity (E/W/I)
  • [ ] AA/AE/AR for original mode
  • [ ] CA/CE/CR for enhanced mode (if supported)