All Projects
Completed

HomeSense: Fall Detection with Computer Vision

Detecting falls in real time using an OAK-D depth camera, YOLO pose estimation, and a custom state machine — my contribution to the HomeSense home-automation system.

Python DepthAI OpenCV OAK-D YOLO Raspberry Pi Socket.IO Computer Vision Embedded Systems IoT Fall Detection

The problem

Falls are the leading cause of injury-related death in adults over 65. Yet most home-monitoring solutions either require the person to actively press a button, or rely on wearable sensors that are forgotten, uncharged, or simply not worn. I wanted a system that works passively — no wristband, no button, just a camera watching a room.

This post covers the fall-detection node I built for HomeSense, a collaborative home-automation platform my team developed in SYSC 3010 (Systems Project). My teammates owned the sensor grid, the controller Pi, and the web dashboard. I owned detection.


System overview

HomeSense is a distributed system: a network of Raspberry Pi nodes communicate over sockets through a central controller. My node’s only job is to consume a video stream, decide whether someone has fallen, and emit a fall_detected event when it’s sure.

HomeSense fall detection pipeline diagram OAK-D Camera RGB + Stereo Depth DepthAI v3.3 YOLO Pose 17-keypoint skeleton on-device inference EMA Smoother α = 0.4 per keypoint noise reduction Posture Analyser torso angle + label posture label stereo depth Fall State Machine NORMAL → CANDIDATE (rapid_drop) → CONFIRMED (down_persist + depth) cooldown timer prevents repeated alerts Socket.IO Alert → Controller Pi → Dashboard "fall_detected" event + timestamp ON-DEVICE (OAK-D) HOST Pi (Python)
Fall detection pipeline — YOLO pose runs on the OAK-D, everything else on the host Raspberry Pi.

The key design choice was putting YOLO pose inference directly on the camera (an OAK-D Lite). The OAK-D has an onboard Myriad X VPU that can run small neural nets at 30 fps without touching the Pi’s CPU. That freed the host to focus on the stateful logic — EMA smoothing, angle computation, and the state machine.


Skeleton → posture label

YOLO outputs 17 COCO keypoints per detected person. I only care about four of them:

KeypointIndexBody part
Left shoulder5top of torso
Right shoulder6top of torso
Left hip11bottom of torso
Right hip12bottom of torso

The torso vector runs from the midpoint of the hips to the midpoint of the shoulders. Its angle from vertical tells me orientation:

def compute_torso_and_label(
    kps: list[list[float]],
    upright_thresh: float = 40.0,
    horiz_thresh: float = 55.0,
) -> tuple[float, str]:
    """
    Return (torso_angle_deg, label) where label is one of
    UPRIGHT / TRANSITION / HORIZONTAL.
    """
    ls, rs = kps[5], kps[6]   # left/right shoulder
    lh, rh = kps[11], kps[12] # left/right hip

    mid_shoulder = ((ls[0] + rs[0]) / 2, (ls[1] + rs[1]) / 2)
    mid_hip      = ((lh[0] + rh[0]) / 2, (lh[1] + rh[1]) / 2)

    dx = mid_shoulder[0] - mid_hip[0]
    dy = mid_shoulder[1] - mid_hip[1]          # positive → down in image
    angle = abs(math.degrees(math.atan2(dx, -dy)))  # 0° = upright

    if angle < upright_thresh:
        label = "UPRIGHT"
    elif angle > horiz_thresh:
        label = "HORIZONTAL"
    else:
        label = "TRANSITION"

    return angle, label

EMA smoothing

Raw keypoint coordinates from YOLO are jittery frame-to-frame. Before computing the torso angle I pass each keypoint through an Exponential Moving Average filter:

ALPHA = 0.4   # higher = more responsive, lower = smoother

def ema_smooth(
    new_kps: list[list[float]],
    prev_kps: list[list[float]] | None,
) -> list[list[float]]:
    if prev_kps is None:
        return new_kps
    return [
        [ALPHA * n[i] + (1 - ALPHA) * p[i] for i in range(len(n))]
        for n, p in zip(new_kps, prev_kps)
    ]

α = 0.4 was chosen empirically — low enough to suppress single-frame noise, high enough that a genuine fall (which happens in ~0.5 s) still registers within two or three frames.


Detecting the fall event

A torso angle alone isn’t enough. Someone lying down to read a book has a horizontal torso too. I need to detect the transition — a rapid drop in keypoint height.

def compute_rapid_drop_and_down_persist(
    history: deque[tuple[float, str]],   # (angle, label) ring buffer
    drop_window: int = 3,
    drop_threshold: float = 40.0,
    persist_frames: int = 25,
) -> tuple[bool, bool]:
    """
    rapid_drop   → hip midpoint fell > drop_threshold px in drop_window frames
    down_persist → person has been HORIZONTAL for >= persist_frames consecutive frames
    """
    angles = [h[0] for h in history]
    labels = [h[1] for h in history]

    # rapid_drop: large angle increase over a short window
    if len(angles) >= drop_window:
        rapid_drop = (angles[-1] - angles[-drop_window]) > drop_threshold
    else:
        rapid_drop = False

    # down_persist: trailing frames are all HORIZONTAL
    trailing = labels[-persist_frames:] if len(labels) >= persist_frames else labels
    down_persist = len(trailing) == persist_frames and all(l == "HORIZONTAL" for l in trailing)

    return rapid_drop, down_persist

The state machine

Three pieces of evidence combine in a simple state machine:

  1. rapid_drop — hip keypoints fell sharply → enter CANDIDATE
  2. down_persist — stayed horizontal for 25+ frames → enter CONFIRMED
  3. Depth — estimated floor distance < 0.7 m → strengthens CONFIRMED

A 30-second cooldown after each alert prevents a single fall from spamming the dashboard.

Interactive — Fall State Machine

Click the event buttons to step through the detection states.

Normal upright posture
Candidate rapid_drop detected
Confirmed fall alert fired
→ System started. Monitoring posture…

Torso angle

Posture label

UPRIGHT

Alerts fired

0

The transitions map directly to code:

if state == "NORMAL" and rapid_drop:
    state = "CANDIDATE"

elif state == "CANDIDATE":
    if down_persist and depth_ok:
        state = "CONFIRMED"
        emit_fall_alert()          # Socket.IO → controller Pi
        cooldown_until = time.time() + 30
    elif not rapid_drop:           # person caught themselves
        state = "NORMAL"

elif state == "CONFIRMED":
    if time.time() > cooldown_until:
        state = "NORMAL"

DepthAI pipeline setup

The OAK-D pipeline is configured once at startup. The camera feeds frames directly into the neural-net node — no round-trip to the Pi host — and the detections come back over a XLink output queue.

def build_pipeline() -> dai.Pipeline:
    pipeline = dai.Pipeline()

    # RGB camera
    cam = pipeline.create(dai.node.Camera)
    cam.setFps(30)
    cam_out = cam.requestOutput((640, 640), dai.ImgFrame.Type.BGR888p)

    # YOLO pose model (compiled for Myriad X)
    nn = pipeline.create(dai.node.NeuralNetwork)
    nn.setBlobPath(Path("models/yolo_pose.blob"))
    cam_out.link(nn.input)

    # XLink output — detections stream back to the host
    xout = pipeline.create(dai.node.XLinkOut)
    xout.setStreamName("detections")
    nn.out.link(xout.input)

    # Stereo depth
    left  = pipeline.create(dai.node.Camera)
    right = pipeline.create(dai.node.Camera)
    left.setBoardSocket(dai.CameraBoardSocket.CAM_B)
    right.setBoardSocket(dai.CameraBoardSocket.CAM_C)

    stereo = pipeline.create(dai.node.StereoDepth)
    stereo.setDefaultProfilePreset(dai.node.StereoDepth.PresetMode.HIGH_DENSITY)
    left.requestOutput((1280, 720)).link(stereo.left)
    right.requestOutput((1280, 720)).link(stereo.right)

    depth_out = pipeline.create(dai.node.XLinkOut)
    depth_out.setStreamName("depth")
    stereo.depth.link(depth_out.input)

    return pipeline

Results

Testing in the lab with a crash mat, the detector achieved:

  • True positive rate: 94 % across 50 staged falls (various directions, speeds)
  • False positive rate: < 2 false alerts per hour of normal activity (sitting, bending, reaching)
  • Latency: median 1.1 s from fall to Socket.IO event (dominated by the 25-frame down_persist window at 30 fps ≈ 0.83 s + network)

The biggest failure mode was slow, controlled descents — someone carefully lowering themselves to the floor doesn’t trigger rapid_drop. That’s intentional: a slow, deliberate motion is not a fall. The tradeoff is that a person who faints slowly while holding a wall could be missed.


What I learned

DepthAI’s on-device inference is genuinely powerful. Running YOLO at 30 fps without touching the Pi CPU meant the host Pi stayed cool and responsive even when my roommate was running the dashboard on the same device.

State machines beat thresholds. My first prototype used a single angle threshold and generated constant false positives. The two-stage rapid_drop → down_persist design dropped the false positive rate by an order of magnitude.

EMA α matters more than you’d think. With α = 0.8, a stumble could look like a fall. With α = 0.2, an actual fall took too many frames to register. Spending time on this parameter paid off.