Introducing Pozify for the Build Small Hackathon

The Problem

Training at home is convenient, but form feedback is hard to get alone.

For beginners and intermediate gym users, the questions are practical: which exercise did I do, how many clean reps were there, which rep needs attention, and is the difference I see a real issue or just a valid variation?

Most people do not need a full-time personal trainer for every session. They need a fast second set of eyes that is affordable, private, and specific enough to act on.

Meet Pozify

Pozify is a small-model workout form reviewer built for the Hugging Face Build Small Hackathon.

Upload a short workout video and Pozify turns it into a structured coaching report: exercise detection, rep counting, per-rep issue markers, annotated video output, confidence notes, and a grounded coach summary.

Pozify product interface for workout form review

🎥

Video as evidence

Pozify starts with a real workout clip, then extracts pose and movement structure before writing any coaching text.

🏋️

Rep-level feedback

The report separates exercise detection, rep counting, valid variations, and issue markers instead of returning generic advice.

🧠

Small-model pipeline

The app combines MediaPipe pose extraction, a tiny trained router, exercise rules, knowledge cards, and a small summary model.

🛡️

Grounded summaries

A verifier checks that the coach summary stays tied to the structured evidence and avoids unsupported safety claims.

Why Build Small

The hackathon constraint was the point: build something useful without hiding behind one giant opaque model.

Pozify fits that philosophy by keeping the system modular and inspectable. Pose extraction handles body landmarks. A custom PyTorch BiLSTM routes clips into squat, push_up, shoulder_press, or unknown. Exercise-specific logic counts reps and marks issues. A coach-summary model turns the structured artifacts into readable feedback.

That order matters. Structured evidence comes first; language comes second.

The Pipeline

The core flow is deliberately plain:

Check the video quality.
Extract pose landmarks with MediaPipe.
Clean the pose signal.
Classify the exercise with a trained router.
Count reps and compute per-rep metrics.
Detect valid variations and issue markers.
Render an annotated video.
Generate a grounded coach summary.
Verify the final report before showing it to the user.

The trained router is intentionally tiny: 182,796 trainable parameters over 30-frame pose windows. It is small enough to fit the Build Small constraint, but still specific enough to make the product feel different from a generic video chatbot.

Coach Intelligence

The coaching layer is where Pozify turns movement evidence into something a user can act on.

Instead of asking a language model to watch a video directly, Pozify gives the model structured artifacts: the detected exercise, rep metrics, variation labels, issue markers, quality notes, and retrieved exercise knowledge cards. The model’s job is narrower and more useful: explain the evidence clearly, keep the advice bounded, and produce a coach summary that can be verified.

Pozify coach intelligence report showing grounded summary, fix-first guidance, next-session plan, and confidence notes

This is the bet behind Pozify: small models can be excellent when the task is specific, the inputs are structured, and the training data matches the product contract.

Large general models are powerful, but they are often doing too much at once. For Pozify, we want specialized models that know exactly what the app needs: route an exercise, count reps, respect valid variations, write grounded coaching JSON, and refuse unsupported claims. With focused fine-tuning, task-specific evaluation, and deterministic verification, a smaller model can match or outperform a larger general model on the narrow job it was trained to do.

The Models We Used

Pozify combines several small, purpose-built components:

Pose extraction: MediaPipe Pose Landmarker Lite extracts body landmarks from short workout clips.
Exercise routing: a custom PyTorch BiLSTM routes 30-frame pose windows into squat, push_up, shoulder_press, or unknown.
Router baseline: scikit-learn HistGradientBoostingClassifier stays available as a reference and fallback artifact.
Rep and issue analysis: exercise-specific state machines and transparent rules count reps, detect valid variations, and mark issues.
Coach summary: build-small-hackathon/pozify-coach-summary1, a Pozify-specific summary model fine-tuned from nvidia/NVIDIA-Nemotron-3-Nano-4B-BF16.
Local/off-grid path: the merged Nemotron-based checkpoint can run locally from POZIFY_COACH_SUMMARY_LOCAL_MODEL_DIR for coach-summary generation without a hosted model call.
Verifier: deterministic checks keep the final report grounded, schema-valid, and away from medical or unsupported safety claims.

The important part is not that every component is tiny in isolation. It is that each component has a clear job, a clear boundary, and a way to fail safely.

Built For Trust

Pozify is careful about what it claims. It is not a medical device, and it does not replace a qualified trainer, clinician, or physical therapist.

The product is designed for everyday form review: timestamped movement evidence, supported exercise labels, issue markers, and coaching text that has to stay grounded in the artifacts. If the model output drifts too far, Pozify can fall back to a safer deterministic summary.

That makes the app feel less like a black box and more like a transparent coaching tool users can inspect.

Explore Pozify

Pozify is open source and available as a Hugging Face Space. Try the app, inspect the pipeline, and follow the small-model build.

GitHub Repo ↗ Open Hugging Face Space

What Comes Next

The current version supports squats, push-ups, shoulder presses, and unknown rejection. The next step is broader evaluation: more independent videos, more exercise types, and stronger checks around lighting, camera angle, and movement quality.

The direction is clear: make at-home training feedback more accessible without turning the coach into a vague chatbot.

Your coach, without the crowd.