Zhizheng Liu, Joe Lin, Wayne Wu, Bolei Zhou
University of California, Los Angeles

git clone --recursive git@github.com:genforce/JOSH.git
cd JOSH
conda create -n josh python=3.10 -y # must use python 3.10 for chumpy compatibility
conda activate josh
# assume CUDA 12.8, install pytorch and packages
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu128
pip install -r requirements.txt
pip install --no-build-isolation git+https://github.com/mattloper/chumpy
pip install -e .
- Download SMPL body models (SMPL_MALE.pkl, SMPL_FEMALE.pkl, SMPL_NEUTRAL.pkl) at the official webpage and place then under
data/smplfolder. - Download VIMO checkpoint(vimo_checkpoint.pth.tar) for HMR and place it under
data/checkpoints. - Download DECO checkpoint(deco_best.pth) for contact estimation and place it under
data/checkpoints. - Move the function
parse_chunksfromthird_party/tram/lib/pipeline/tools.pytothird_party/tram/lib/models/hmr_vimo.pyso we don't install extra dependencies.
Assume the demo video is located at $input_folder/XXXX.mp4, run the following:
rerun --serve-grpc # in another terminal, for visualization
bash josh_demo.sh $input_folder
For example, run bash josh_demo.sh assets/demo1, we will store all the intermediate outputs as well as the final result under $input_folder.
Compared to the original paper, we now support using the local point cloud from the state-of-the-art method Pi3X as initialization, which could lead to a better reconstruction performance.
Note that since JOSH is an optimization-based method, you may want to tune the hyper-parameters for the optimal performance (see josh/config.py). With the default hyperparameters, you should get the following results after running the demos:
Demo 1 Sample Output
Demo 2 Sample Output
Long Demo Sample Output
For long videos (>=200 frames), we apply chunk processing and then aggregate the chunk results by simply concatenating them (see josh/aggregate_results.py). We will leave global bundle adjustment to future work.
To be updated before the ICLR conference
To be updated before the ICLR conference
We would like to thank the following projects for inspiring our work and open-sourcing their implementations:
Human Mesh Recovery: WHAM, TRAM, HMR2.0
Scene Reconstruction: DUSt3R, MASt3R, Pi3
Human Contact Estimation: BSTRO, DECO
Evaluation Datasets: EMDB, SLOPER4D, RICH
For any questions or discussions, please contact Zhizheng Liu.
If our work is helpful to your research, please cite the following:
@article{liu2026joint,
title={Joint Optimization for 4D Human-Scene Reconstruction in the Wild},
author={Liu, Zhizheng and Lin, Joe and Wu, Wayne and Zhou, Bolei},
journal={The Fourteenth International Conference on Learning Representations},
year={2026}
} 
