SecuritySpy

Multi-camera video surveillance software for the Mac

SecuritySpy Movie File Metadata

This document describes the custom metadata atoms embedded in QuickTime/MP4 movie files.

1. Overview

SecuritySpy stores custom metadata inside the standard QuickTime/MP4 user data container. The location within the file hierarchy is:

File (ftyp, mdat, ...)
 └── moov
      ├── mvhd
      ├── trak (video)
      ├── trak (audio, if present)
      └── udta
           ├── Sver  (always present)
           ├── Mtyp  (always present)
           ├── Mtma  (optional)
           └── Evnt  (optional)

Each item is a standard QuickTime atom (also called a box in ISO Base Media File Format terminology). An atom begins with a 4-byte big size followed by a 4-byte ASCII type code.

General rules

2. Sver — Software Version

Records the version of SecuritySpy that created the movie file. Always present.

Layout (16 bytes)

OffsetSizeFieldTypeByte OrderDescription
04sizeuint32BigAlways 0x00000010 (16)
44typeASCII'Sver' (0x53766572)
88versionchar[8]Null-terminated ASCII version string, e.g. "6.0.1"

The version string occupies exactly 8 bytes. Shorter strings are null-terminated; any remaining bytes after the null terminator should be ignored.

3. Mtyp — Movie Type

Indicates how the movie was recorded. Always present.

Layout (12 bytes)

OffsetSizeFieldTypeByte OrderDescription
04sizeuint32BigAlways 0x0000000C (12)
44typeASCII'Mtyp' (0x4D747970)
81movieTypeuint8See values below
93(unused)Reserved, currently zero

Movie type values

ValueMeaning
0Motion capture — recording was triggered by detected motion or other events
1Continuous — recording is part of a continuous recording schedule

4. Mtma — Time Mapping

Maps positions within the movie timeline to real wall-clock times. This is essential because a single movie file may span gaps in recording (e.g. if the camera went offline briefly or if motion-capture segments are concatenated). Each entry marks the start of a contiguous segment and records the wall-clock time at which that segment began.

This atom is optional — it is present only when time-mapping information was recorded.

Atom header (8 bytes)

OffsetSizeFieldTypeByte OrderDescription
04sizeuint32Big8 + (N × 42) where N is the number of entries
44typeASCII'Mtma' (0x4D746D61)

Immediately following the 8-byte header is an array of N time-mapping entries.

Each entry (42 bytes)

OffsetSizeFieldTypeByte OrderDescription
02yearint16BigCalendar year (e.g. 2026)
22monthint16Big1–12
42dayint16Big1–31
62hourint16Big0–23 (local time)
82minuteint16Big0–59
102secondint16Big0–59
122subSec600int16BigSub-second precision in 1/600ths of a second (0–599)
142(reserved)int16BigReserved for future use
168startTimeValueint64BigStart position of this segment in the movie, in timescale units (timescale 600)
248endTimeValueint64BigEnd position of this segment in the movie, in timescale units (timescale 600)
322(reserved)int16BigReserved for future use
342(reserved)int16BigReserved (in files created before v5.2.5 this field holds the timescale value, 600)
Note: The entry size is fixed at 42 bytes and will not be extended in future versions.

Interpreting time-mapping data

To convert a movie timeline position to a wall-clock time:

  1. Find the Mtma entry whose startTimeValue ≤ position < endTimeValue.
  2. Compute the offset: offset = position − startTimeValue.
  3. Convert the offset to seconds: seconds = offset / 600.0.
  4. Add the offset to the wall-clock time given by the entry's date/time fields (year, month, day, hour, minute, second, subSec600).

If the position falls in a gap between segments (i.e. between one entry's endTimeValue and the next entry's startTimeValue), there is no valid wall-clock mapping for that position.

5. Evnt — Event Data

Contains a per-frame record of motion detection, AI classification, and presence detection events. Only frames with notable activity are recorded here — not every frame in the movie has an event entry.

This atom is optional — it is present only when event data was recorded.

Atom header (12 bytes)

OffsetSizeFieldTypeByte OrderDescription
04sizeuint32Big12 + (N × recordSize)
44typeASCII'Evnt' (0x4576746E)
81recordSizeuint8Size of each entry in bytes (currently 32)
91versionuint8Format version (currently 1)
101(reserved)uint80
111(reserved)uint80
Forward compatibility: The recordSize field in the header tells you the size of each entry. Future versions may extend the entry beyond 32 bytes (up to 252). When reading, always use the recordSize from the header to step through entries, and ignore any trailing bytes you don't recognise. If recordSize is smaller than expected (from an older file), treat missing fields as zero.

Each entry — version 1 (32 bytes)

OffsetSizeFieldTypeByte OrderDescription
04movieTimeuint32BigPosition in the movie timeline, in timescale units (timescale 600)
41triggerFlagbool1 if this frame caused a motion trigger (i.e. motion duration exceeded the trigger threshold), 0 otherwise
53(reserved)Reserved, currently zero
81classifiedFlagbool1 if this frame was classified by the AI model, 0 if not
91probHuint8Probability of human presence (0–100%). Only valid when classifiedFlag is 1
101probVuint8Probability of vehicle presence (0–100%). Only valid when classifiedFlag is 1
111probAuint8Probability of animal presence (0–100%). Only valid when classifiedFlag is 1
122mdRect.xuint16BigBounding rectangle of detected motion, normalised to a 65535×65535 coordinate space. Valid only when triggerFlag is 1. To convert to pixel coordinates: pixelX = mdRect.x × imageWidth / 65535
142mdRect.yuint16Big
162mdRect.wuint16Big
182mdRect.huint16Big
204seqIduint32BigSequence identifier — a random number shared by all frames belonging to the same continuous motion event. A new sequence starts when there is a gap of more than 4 seconds since the previous motion frame
242arrivedObjectsuint16LittleBitfield indicating which object types have arrived in the scene on this frame (see object type bits below)
262departedObjectsuint16LittleBitfield indicating which object types have departed from the scene on this frame
281presenceRect.xuint8Bounding rectangle of the presence detection zone, normalised to a 32×32 coordinate space. To convert to pixel coordinates: pixelX = presenceRect.x × imageWidth / 32
291presenceRect.yuint8
301presenceRect.wuint8
311presenceRect.huint8

Object type bitfield values

Used in the arrivedObjects and departedObjects fields:

BitMaskObject Type
00x0001Human
10x0002Vehicle
20x0004Animal

Bits 3–15 are reserved for future object types.

Understanding event fields

6. Additional Notes

Older file formats

Files created by SecuritySpy versions prior to the introduction of the version-1 Evnt format may contain a version-0 event record with a smaller recordSize (20 bytes). The version-0 layout is:

OffsetSizeFieldTypeByte OrderDescription
04movieTimeuint32BigPosition in movie timeline
41probHint8Human probability (0–100) or −1 if not classified
51probVint8Vehicle probability (0–100) or −1 if not classified
61(reserved)uint80
71motionValueuint8Motion intensity (0–180)
88mdRectIntRectU16BigMotion detection location (normalised to 65535×65535)

Always check the recordSize field in the Evnt atom header to determine which version you are reading.

Reading the udta atom

To locate the udta atom, parse the moov atom's children by walking through atoms sequentially (read size, read type, skip to next). The udta atom will have type code 0x75647461. Then walk its children the same way to find each metadata atom by its type code.

Complete udta structure summary

udta (variable size)
 ├── Sver  16 bytes (fixed)       — always present
 ├── Mtyp  12 bytes (fixed)       — always present
 ├── Mtma   8 + N×42 bytes       — optional
 └── Evnt  12 + N×recordSize     — optional

Example: parsing in Python

import struct

def read_atom(f):
    """Read an atom header; returns (type, size, data_offset)."""
    hdr = f.read(8)
    if len(hdr) < 8:
        return None, 0, 0
    size, atype = struct.unpack('>I4s', hdr)
    return atype, size, f.tell()

def find_atom(f, parent_offset, parent_size, target_type):
    """Find a child atom of the given type within a parent atom."""
    f.seek(parent_offset)
    end = parent_offset + parent_size
    while f.tell() < end:
        atype, size, data_off = read_atom(f)
        if atype is None or size < 8:
            break
        if atype == target_type:
            return data_off, size - 8
        f.seek(data_off + size - 8)
    return None, 0

def parse_sver(data):
    """Parse an Sver atom's payload (8 bytes after header)."""
    return data[:8].split(b'\x00')[0].decode('ascii')

def parse_mtyp(data):
    """Parse an Mtyp atom's payload (4 bytes after header)."""
    return 'continuous' if data[0] == 1 else 'motion-capture'

def parse_mtma(data):
    """Parse Mtma entries from the payload after the 8-byte atom header."""
    entries = []
    entry_size = 42
    count = len(data) // entry_size
    for i in range(count):
        e = data[i*entry_size:(i+1)*entry_size]
        year, month, day, hour, minute, second, subsec = struct.unpack('>7h', e[0:14])
        start, end = struct.unpack('>qq', e[16:32])
        entries.append({
            'time': f'{year:04d}-{month:02d}-{day:02d} {hour:02d}:{minute:02d}:{second:02d}.{subsec}',
            'start': start, 'end': end
        })
    return entries

def parse_evnt(data):
    """Parse Evnt entries from the full atom payload (after 8-byte atom header)."""
    record_size = data[0]
    version = data[1]
    entries = []
    payload = data[4:]  # skip the 4-byte val field
    count = len(payload) // record_size
    for i in range(count):
        e = payload[i*record_size:(i+1)*record_size]
        movie_time = struct.unpack('>I', e[0:4])[0]
        trigger = bool(e[4])
        classified = bool(e[8])
        prob_h, prob_v, prob_a = e[9], e[10], e[11]
        md_x, md_y, md_w, md_h = struct.unpack('>4H', e[12:20])
        seq_id = struct.unpack('>I', e[20:24])[0]
        arrived, departed = struct.unpack('<HH', e[24:28])
        pr_x, pr_y, pr_w, pr_h = e[28], e[29], e[30], e[31]
        entries.append({
            'movieTime': movie_time,
            'trigger': trigger,
            'classified': classified,
            'probHuman': prob_h if classified else None,
            'probVehicle': prob_v if classified else None,
            'probAnimal': prob_a if classified else None,
            'mdRect': (md_x, md_y, md_w, md_h),
            'seqId': seq_id,
            'arrivedObjects': arrived,
            'departedObjects': departed,
            'presenceRect': (pr_x, pr_y, pr_w, pr_h),
        })
    return entries