SecuritySpy

Multi-camera video surveillance software for the Mac

SecuritySpy Movie File Metadata

This document describes the custom metadata atoms embedded in QuickTime/MP4 movie files.

1. Overview

SecuritySpy stores custom metadata inside the standard QuickTime/MP4 user data container. The location within the file hierarchy is:

File (ftyp, mdat, ...)
 └── moov
      ├── mvhd
      ├── trak (video)
      ├── trak (audio, if present)
      └── udta
           ├── Sver  (always present)
           ├── Mtyp  (always present)
           ├── Mtma  (optional)
           └── Evnt  (optional)

Each item is a standard QuickTime atom (also called a box in ISO Base Media File Format terminology). An atom begins with a 4-byte big size followed by a 4-byte ASCII type code.

General rules

All multi-byte integer fields are stored in big (network) byte order unless explicitly noted otherwise.
Structures are tightly packed with no padding between fields.
The atom size field always includes the 8 bytes of the atom header itself.
The movie timescale is 600 (i.e. one timescale unit = 1/600th of a second).

2. Sver — Software Version

Records the version of SecuritySpy that created the movie file. Always present.

Layout (16 bytes)

Offset	Size	Field	Type	Byte Order	Description
0	4	size	uint32	Big	Always `0x00000010` (16)
4	4	type	ASCII	—	`'Sver'` (0x53766572)
8	8	version	char[8]	—	Null-terminated ASCII version string, e.g. `"6.0.1"`

The version string occupies exactly 8 bytes. Shorter strings are null-terminated; any remaining bytes after the null terminator should be ignored.

3. Mtyp — Movie Type

Indicates how the movie was recorded. Always present.

Layout (12 bytes)

Offset	Size	Field	Type	Byte Order	Description
0	4	size	uint32	Big	Always `0x0000000C` (12)
4	4	type	ASCII	—	`'Mtyp'` (0x4D747970)
8	1	movieType	uint8	—	See values below
9	3	(unused)	—	—	Reserved, currently zero

Movie type values

Value	Meaning
`0`	Motion capture — recording was triggered by detected motion or other events
`1`	Continuous — recording is part of a continuous recording schedule

4. Mtma — Time Mapping

Maps positions within the movie timeline to real wall-clock times. This is essential because a single movie file may span gaps in recording (e.g. if the camera went offline briefly or if motion-capture segments are concatenated). Each entry marks the start of a contiguous segment and records the wall-clock time at which that segment began.

This atom is optional — it is present only when time-mapping information was recorded.

Atom header (8 bytes)

Offset	Size	Field	Type	Byte Order	Description
0	4	size	uint32	Big	`8 + (N × 42)` where N is the number of entries
4	4	type	ASCII	—	`'Mtma'` (0x4D746D61)

Immediately following the 8-byte header is an array of N time-mapping entries.

Each entry (42 bytes)

Offset	Size	Field	Type	Byte Order	Description
0	2	year	int16	Big	Calendar year (e.g. 2026)
2	2	month	int16	Big	1–12
4	2	day	int16	Big	1–31
6	2	hour	int16	Big	0–23 (local time)
8	2	minute	int16	Big	0–59
10	2	second	int16	Big	0–59
12	2	subSec600	int16	Big	Sub-second precision in 1/600ths of a second (0–599)
14	2	(reserved)	int16	Big	Reserved for future use
16	8	startTimeValue	int64	Big	Start position of this segment in the movie, in timescale units (timescale 600)
24	8	endTimeValue	int64	Big	End position of this segment in the movie, in timescale units (timescale 600)
32	2	(reserved)	int16	Big	Reserved for future use
34	2	(reserved)	int16	Big	Reserved (in files created before v5.2.5 this field holds the timescale value, 600)

Note: The entry size is fixed at 42 bytes and will not be extended in future versions.

Interpreting time-mapping data

To convert a movie timeline position to a wall-clock time:

Find the Mtma entry whose startTimeValue ≤ position < endTimeValue.
Compute the offset: offset = position − startTimeValue.
Convert the offset to seconds: seconds = offset / 600.0.
Add the offset to the wall-clock time given by the entry's date/time fields (year, month, day, hour, minute, second, subSec600).

If the position falls in a gap between segments (i.e. between one entry's endTimeValue and the next entry's startTimeValue), there is no valid wall-clock mapping for that position.

5. Evnt — Event Data

Contains a per-frame record of motion detection, AI classification, and presence detection events. Only frames with notable activity are recorded here — not every frame in the movie has an event entry.

This atom is optional — it is present only when event data was recorded.

Atom header (12 bytes)

Offset	Size	Field	Type	Byte Order	Description
0	4	size	uint32	Big	`12 + (N × recordSize)`
4	4	type	ASCII	—	`'Evnt'` (0x4576746E)
8	1	recordSize	uint8	—	Size of each entry in bytes (currently 32)
9	1	version	uint8	—	Format version (currently 1)
10	1	(reserved)	uint8	—	0
11	1	(reserved)	uint8	—	0

Forward compatibility: The recordSize field in the header tells you the size of each entry. Future versions may extend the entry beyond 32 bytes (up to 252). When reading, always use the recordSize from the header to step through entries, and ignore any trailing bytes you don't recognise. If recordSize is smaller than expected (from an older file), treat missing fields as zero.

Each entry — version 1 (32 bytes)

Offset	Size	Field	Type	Byte Order	Description
0	4	movieTime	uint32	Big	Position in the movie timeline, in timescale units (timescale 600)
4	1	triggerFlag	bool	—	1 if this frame caused a motion trigger (i.e. motion duration exceeded the trigger threshold), 0 otherwise
5	3	(reserved)	—	—	Reserved, currently zero
8	1	classifiedFlag	bool	—	1 if this frame was classified by the AI model, 0 if not
9	1	probH	uint8	—	Probability of human presence (0–100%). Only valid when `classifiedFlag` is 1
10	1	probV	uint8	—	Probability of vehicle presence (0–100%). Only valid when `classifiedFlag` is 1
11	1	probA	uint8	—	Probability of animal presence (0–100%). Only valid when `classifiedFlag` is 1
12	2	mdRect.x	uint16	Big	Bounding rectangle of detected motion, normalised to a 65535×65535 coordinate space. Valid only when `triggerFlag` is 1. To convert to pixel coordinates: `pixelX = mdRect.x × imageWidth / 65535`
14	2	mdRect.y	uint16	Big
16	2	mdRect.w	uint16	Big
18	2	mdRect.h	uint16	Big
20	4	seqId	uint32	Big	Sequence identifier — a random number shared by all frames belonging to the same continuous motion event. A new sequence starts when there is a gap of more than 4 seconds since the previous motion frame
24	2	arrivedObjects	uint16	Little	Bitfield indicating which object types have arrived in the scene on this frame (see object type bits below)
26	2	departedObjects	uint16	Little	Bitfield indicating which object types have departed from the scene on this frame
28	1	presenceRect.x	uint8	—	Bounding rectangle of the presence detection zone, normalised to a 32×32 coordinate space. To convert to pixel coordinates: `pixelX = presenceRect.x × imageWidth / 32`
29	1	presenceRect.y	uint8	—
30	1	presenceRect.w	uint8	—
31	1	presenceRect.h	uint8	—

Object type bitfield values

Used in the arrivedObjects and departedObjects fields:

Bit	Mask	Object Type
0	`0x0001`	Human
1	`0x0002`	Vehicle
2	`0x0004`	Animal

Bits 3–15 are reserved for future object types.

Understanding event fields

movieTime — use this value with the Mtma time mapping to determine the real wall-clock time of the event.
triggerFlag — indicates the frame at which SecuritySpy determined that motion had persisted long enough to constitute a genuine trigger event. Frames with motion that hasn't yet exceeded the trigger duration will have triggerFlag = 0.
classifiedFlag / probH / probV / probA — when the AI classification model has been run on a frame, classifiedFlag is set to 1 and the three probability fields indicate the confidence (0–100%) that the frame contains a human, vehicle, or animal respectively. When classifiedFlag is 0, the probability fields should be ignored.
seqId — groups related frames into motion sequences. All event entries that are part of the same continuous motion event share the same seqId value. A new random seqId is generated when motion resumes after a gap of more than 4 seconds.
arrivedObjects / departedObjects — indicate transitions in presence detection. When an object type first appears in the scene, the corresponding bit is set in arrivedObjects. When it leaves, the bit is set in departedObjects. On most frames these fields will be zero.
presenceRect — the region of the frame in which presence detection is operating, normalised to a 32×32 grid.

6. Additional Notes

Older file formats

Files created by SecuritySpy versions prior to the introduction of the version-1 Evnt format may contain a version-0 event record with a smaller recordSize (20 bytes). The version-0 layout is:

Offset	Size	Field	Type	Byte Order	Description
0	4	movieTime	uint32	Big	Position in movie timeline
4	1	probH	int8	—	Human probability (0–100) or −1 if not classified
5	1	probV	int8	—	Vehicle probability (0–100) or −1 if not classified
6	1	(reserved)	uint8	—	0
7	1	motionValue	uint8	—	Motion intensity (0–180)
8	8	mdRect	IntRectU16	Big	Motion detection location (normalised to 65535×65535)

Always check the recordSize field in the Evnt atom header to determine which version you are reading.

Reading the udta atom

To locate the udta atom, parse the moov atom's children by walking through atoms sequentially (read size, read type, skip to next). The udta atom will have type code 0x75647461. Then walk its children the same way to find each metadata atom by its type code.

Complete udta structure summary

udta (variable size)
 ├── Sver  16 bytes (fixed)       — always present
 ├── Mtyp  12 bytes (fixed)       — always present
 ├── Mtma   8 + N×42 bytes       — optional
 └── Evnt  12 + N×recordSize     — optional

Example: parsing in Python

import struct

def read_atom(f):
    """Read an atom header; returns (type, size, data_offset)."""
    hdr = f.read(8)
    if len(hdr) < 8:
        return None, 0, 0
    size, atype = struct.unpack('>I4s', hdr)
    return atype, size, f.tell()

def find_atom(f, parent_offset, parent_size, target_type):
    """Find a child atom of the given type within a parent atom."""
    f.seek(parent_offset)
    end = parent_offset + parent_size
    while f.tell() < end:
        atype, size, data_off = read_atom(f)
        if atype is None or size < 8:
            break
        if atype == target_type:
            return data_off, size - 8
        f.seek(data_off + size - 8)
    return None, 0

def parse_sver(data):
    """Parse an Sver atom's payload (8 bytes after header)."""
    return data[:8].split(b'\x00')[0].decode('ascii')

def parse_mtyp(data):
    """Parse an Mtyp atom's payload (4 bytes after header)."""
    return 'continuous' if data[0] == 1 else 'motion-capture'

def parse_mtma(data):
    """Parse Mtma entries from the payload after the 8-byte atom header."""
    entries = []
    entry_size = 42
    count = len(data) // entry_size
    for i in range(count):
        e = data[i*entry_size:(i+1)*entry_size]
        year, month, day, hour, minute, second, subsec = struct.unpack('>7h', e[0:14])
        start, end = struct.unpack('>qq', e[16:32])
        entries.append({
            'time': f'{year:04d}-{month:02d}-{day:02d} {hour:02d}:{minute:02d}:{second:02d}.{subsec}',
            'start': start, 'end': end
        })
    return entries

def parse_evnt(data):
    """Parse Evnt entries from the full atom payload (after 8-byte atom header)."""
    record_size = data[0]
    version = data[1]
    entries = []
    payload = data[4:]  # skip the 4-byte val field
    count = len(payload) // record_size
    for i in range(count):
        e = payload[i*record_size:(i+1)*record_size]
        movie_time = struct.unpack('>I', e[0:4])[0]
        trigger = bool(e[4])
        classified = bool(e[8])
        prob_h, prob_v, prob_a = e[9], e[10], e[11]
        md_x, md_y, md_w, md_h = struct.unpack('>4H', e[12:20])
        seq_id = struct.unpack('>I', e[20:24])[0]
        arrived, departed = struct.unpack('<HH', e[24:28])
        pr_x, pr_y, pr_w, pr_h = e[28], e[29], e[30], e[31]
        entries.append({
            'movieTime': movie_time,
            'trigger': trigger,
            'classified': classified,
            'probHuman': prob_h if classified else None,
            'probVehicle': prob_v if classified else None,
            'probAnimal': prob_a if classified else None,
            'mdRect': (md_x, md_y, md_w, md_h),
            'seqId': seq_id,
            'arrivedObjects': arrived,
            'departedObjects': departed,
            'presenceRect': (pr_x, pr_y, pr_w, pr_h),
        })
    return entries