Skip to content
/ JOSH Public

[ICLR 2026] Official Implementation for Paper "Joint Optimization for 4D Human-Scene Reconstruction in the Wild"

Notifications You must be signed in to change notification settings

genforce/JOSH

Repository files navigation

Joint Optimization for 4D Human-Scene Reconstruction in the Wild

Zhizheng Liu, Joe Lin, Wayne Wu, Bolei Zhou
University of California, Los Angeles
Teaser

Installation

Setup the repo:

git clone --recursive git@github.com:genforce/JOSH.git
cd JOSH
conda create -n josh python=3.10 -y # must use python 3.10 for chumpy compatibility
conda activate josh

Installing Dependencies (Tested with Ubuntu 22.04 + CUDA 12.8 + 24GB VRAM):

# assume CUDA 12.8, install pytorch and packages
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu128
pip install -r requirements.txt 
pip install --no-build-isolation git+https://github.com/mattloper/chumpy
pip install -e .

Download Pretrained Models

  • Download SMPL body models (SMPL_MALE.pkl, SMPL_FEMALE.pkl, SMPL_NEUTRAL.pkl) at the official webpage and place then under data/smpl folder.
  • Download VIMO checkpoint(vimo_checkpoint.pth.tar) for HMR and place it under data/checkpoints.
  • Download DECO checkpoint(deco_best.pth) for contact estimation and place it under data/checkpoints.
  • Move the function parse_chunks from third_party/tram/lib/pipeline/tools.py to third_party/tram/lib/models/hmr_vimo.py so we don't install extra dependencies.

JOSH Demo

Assume the demo video is located at $input_folder/XXXX.mp4, run the following:

rerun --serve-grpc # in another terminal, for visualization
bash josh_demo.sh $input_folder

For example, run bash josh_demo.sh assets/demo1, we will store all the intermediate outputs as well as the final result under $input_folder.

Compared to the original paper, we now support using the local point cloud from the state-of-the-art method Pi3X as initialization, which could lead to a better reconstruction performance.

Note that since JOSH is an optimization-based method, you may want to tune the hyper-parameters for the optimal performance (see josh/config.py). With the default hyperparameters, you should get the following results after running the demos:

Demo 1 Sample Output

demo1

Demo 2 Sample Output

demo2

Long Demo Sample Output

demo3 For long videos (>=200 frames), we apply chunk processing and then aggregate the chunk results by simply concatenating them (see josh/aggregate_results.py). We will leave global bundle adjustment to future work.

JOSH3R Demo

To be updated before the ICLR conference

Evaluation

To be updated before the ICLR conference

Acknowledgements

We would like to thank the following projects for inspiring our work and open-sourcing their implementations:

Human Mesh Recovery: WHAM, TRAM, HMR2.0

Scene Reconstruction: DUSt3R, MASt3R, Pi3

Human Contact Estimation: BSTRO, DECO

Evaluation Datasets: EMDB, SLOPER4D, RICH

Contact

For any questions or discussions, please contact Zhizheng Liu.

Reference

If our work is helpful to your research, please cite the following:

@article{liu2026joint,
    title={Joint Optimization for 4D Human-Scene Reconstruction in the Wild},
    author={Liu, Zhizheng and Lin, Joe and Wu, Wayne and Zhou, Bolei},
    journal={The Fourteenth International Conference on Learning Representations},
    year={2026}
} 

About

[ICLR 2026] Official Implementation for Paper "Joint Optimization for 4D Human-Scene Reconstruction in the Wild"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published