Context
A topology-repair pipeline for TreeQSM cylinder outputs: reconstructs a connected directed skeleton, extracts trunk + branch orders hierarchically, then rewrites TreeQSM-style topology columns so branches and extensions align with real orchard tree structure.
What I built
- 1DAG-based skeletonization from cylinder centerlines with cycle detection/removal safeguards.
- 2Trunk detection as the common prefix across primary branches (labels trunk as a distinct order).
- 3Radius-aware primary branch selection: defaults to longest physical path, only switches if a competing branch is ≥2× thicker (configurable).
- 4TreeQSM-compatible topology rewrite: parent (1-based), branch IDs, extension ranking (by child radius), and position-in-branch sequencing.
- 5Batch runner that processes directories of TreeQSM CSVs and outputs per-tree artifacts (cleaned CSVs + interactive HTML QA views).
Results
- →Recovered consistent parent/child topology and branch grouping on open-vase trees (≈90% agreement with expected structure in internal validation).
- →Made TreeQSM cylinder outputs usable across multiple research workflows by exporting a cleaned CSV with corrected topology fields + visualizations for QA.
Problem
TreeQSM’s outputs assume forest-like branching and often mislabels parent/branch/extension relationships on open-vase architectures. That breaks downstream analyses: branch-order features drift, branch membership gets scrambled, and manual cleanup becomes the bottleneck.
Approach
I rebuild the tree as a directed acyclic graph (DAG) directly from cylinder centerlines. If TreeQSM parent fields exist, I use them to connect cylinder endpoints; otherwise I fall back to proximity-based stitching with direction constraints. Then I extract primary branches using a conservative rule (follow the longest path unless a sibling is significantly thicker by radius), detect the shared trunk as the common prefix, and iteratively discover higher-order branches from branching points. Finally, I export a corrected cylinder table that updates parent/branch/extension/position_in_branch deterministically (including stable child ranking by radius).
What I learned
Produces a “cleaned_fixed.csv” per tree with corrected topology fields, plus interactive Plotly 3D visualizations showing branch orders and trunk segmentation. Includes batch processing, optional memory tracking, and CSV exports for node/edge branch assignments to support debugging and reproducibility.
Links