Project writeup

Fixing TreeQSM Outputs with Skeletonization

Recently I've worked a lot with a software called TreeQSM which is used to extract structural traits from 3D Point Clouds of trees. It's great at extracting QSM traits from forest-style trees, but it breaks down on open-vase architectures (topology, branch IDs, order, and parent/child links don’t reflect reality). I built a post-processing tool that reconstructs a clean directed skeleton from TreeQSM cylinders and rewrites the cylinder topology (parent/branch/extension/position) using classic graph algorithms, so downstream trait extraction stays meaningful.

Role

Built the full topology-fixing layer: cylinder to DAG skeleton reconstruction, hierarchical branch-order extraction, and a TreeQSM-compatible export that rewrites parent/branch/extension/position_in_branch at scale (plus visual QA).

Stack

PythonTreeQSMNetworkXNumPyPandasPlotly

Context

A topology-repair pipeline for TreeQSM cylinder outputs: reconstructs a connected directed skeleton, extracts trunk + branch orders hierarchically, then rewrites TreeQSM-style topology columns so branches and extensions align with real orchard tree structure.

What I built

  • 1DAG-based skeletonization from cylinder centerlines with cycle detection/removal safeguards.
  • 2Trunk detection as the common prefix across primary branches (labels trunk as a distinct order).
  • 3Radius-aware primary branch selection: defaults to longest physical path, only switches if a competing branch is ≥2× thicker (configurable).
  • 4TreeQSM-compatible topology rewrite: parent (1-based), branch IDs, extension ranking (by child radius), and position-in-branch sequencing.
  • 5Batch runner that processes directories of TreeQSM CSVs and outputs per-tree artifacts (cleaned CSVs + interactive HTML QA views).

Results

  • Recovered consistent parent/child topology and branch grouping on open-vase trees (≈90% agreement with expected structure in internal validation).
  • Made TreeQSM cylinder outputs usable across multiple research workflows by exporting a cleaned CSV with corrected topology fields + visualizations for QA.

Problem

TreeQSM’s outputs assume forest-like branching and often mislabels parent/branch/extension relationships on open-vase architectures. That breaks downstream analyses: branch-order features drift, branch membership gets scrambled, and manual cleanup becomes the bottleneck.

Approach

I rebuild the tree as a directed acyclic graph (DAG) directly from cylinder centerlines. If TreeQSM parent fields exist, I use them to connect cylinder endpoints; otherwise I fall back to proximity-based stitching with direction constraints. Then I extract primary branches using a conservative rule (follow the longest path unless a sibling is significantly thicker by radius), detect the shared trunk as the common prefix, and iteratively discover higher-order branches from branching points. Finally, I export a corrected cylinder table that updates parent/branch/extension/position_in_branch deterministically (including stable child ranking by radius).

What I learned

Produces a “cleaned_fixed.csv” per tree with corrected topology fields, plus interactive Plotly 3D visualizations showing branch orders and trunk segmentation. Includes batch processing, optional memory tracking, and CSV exports for node/edge branch assignments to support debugging and reproducibility.

Links