Mosaic

tile-based vision pipeline
globallocalglobal
Tokyo street at night, drone view
Stage 1 of 3
Reading the full scene
The model first sees the entire image and writes an orientation summary. This becomes the spatial context for every detailed tile inspection that follows.
Full-image pass
Tile grid
Focal tile
Paired neighbor