问答社区

原创作者: 图龙网络科技发布时间： 2023-09-23 236.64K 人阅读

腾讯混元-3D: 首个同时支持文生和图生的3D开源模型

太极混元发布于 4个月前分类：语言模型

模型采用两阶段生成方法，在保证质量和可控的基础上，仅需10秒即可生成3D资产。在第一阶段，我们采用了一种多视角扩散模型，轻量版模型能够在大约4秒内高效生成多视角图像，这些多视角图像从不同的视角捕捉了3D资产的丰富的纹理和几何先验，将任务从单视角重建松弛到多视角重建。在第二阶段，我们引入了一种前馈重建模型，利用上一阶段生成的多视角图像。该模型能够在大约3秒内快速而准确地重建3D资产。重建模型学习处理多视角扩散引入的噪声和不一致性，并利用条件图像中的可用信息高效恢复3D结构。最终，该模型可以实现输入任意单视角实现三维生成。

1732153337-f63d86b0a967b70

先克隆存储库:

git clone https://github.com/tencent/Hunyuan3D-1
cd Hunyuan3D-1

Linux安装指南

我们提供了一个用于设置环境的环境设置脚本文件。

# step 1, create conda env
conda create -n hunyuan3d-1 python=3.9 or 3.10 or 3.11 or 3.12
conda activate hunyuan3d-1

pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
bash env_install.sh

# or
pip install -r requirements.txt --index-url https://download.pytorch.org/whl/cu121

因为有灰尘,我们提供了一个指南:

cd third_party
git clone --recursive https://github.com/naver/dust3r.git

cd ../third_party/weights
wget https://download.europe.naverlabs.com/ComputerVision/DUSt3R/DUSt3R_ViTLarge_BaseDecoder_512_dpt.pth

💡Other tips for envrionment installation

下载预先训练的模型

模型可于公司/腾讯/洪源3D-1 :

Hunyuan3D-1/lite,多视图生成的轻量级模型。
Hunyuan3D-1/std,多视图生成的标准模型。
Hunyuan3D-1/svrm,星景重建模型。

要下载这个模型,首先要安装"拥抱"。(详情请参阅在这里 .)

python3 -m pip install "huggingface_hub[cli]"

然后使用下列命令下载模型:

mkdir weights
huggingface-cli download tencent/Hunyuan3D-1 --local-dir ./weights

mkdir weights/hunyuanDiT
huggingface-cli download Tencent-Hunyuan/HunyuanDiT-v1.1-Diffusers-Distilled --local-dir ./weights/hunyuanDiT

推论

对于文本到3D生成,我们支持双语中文和英语,您可以使用以下命令进行推理。

python3 main.py \
    --text_prompt "a lovely rabbit" \
    --save_folder ./outputs/test/ \
    --max_faces_num 90000 \
    --do_texture_mapping \
    --do_render

对于图像到3D生成,您可以使用下面的命令进行推理。

python3 main.py \
    --image_prompt "/path/to/your/image" \
    --save_folder ./outputs/test/ \
    --max_faces_num 90000 \
    --do_texture_mapping \
    --do_render

我们列出一些更有用的配置,便于使用:

论点	违约	描述
`--text_prompt`	没有	3d生成的文本提示符
`--image_prompt`	没有	3D生成的图像提示器
`--t2i_seed`	0	生成图像的随机种子
`--t2i_steps`	25	从文本取样到图像的步骤数
`--gen_seed`	0	生成3D的随机种子
`--gen_steps`	50	3d生成采样步骤的数目
`--max_faces_numm`	90000	3d网眼的极限数目
`--save_memory`	假的	模块将自动转移到CPU
`--do_texture_mapping`	假的	改变顶点阴影到纹理阴影
`--do_render`	假的	伦德吉夫

我们还准备了不同配置的脚本供参考

引线需要30GBVRAM(24GVRAM与--保存_内存)。
引线需要22gbVRAM(18gVRAM带有--保存_内存)。
注意:-保存_内存将增加推理时间

bash scripts/text_to_3d_std.sh 
bash scripts/text_to_3d_lite.sh 
bash scripts/image_to_3d_std.sh 
bash scripts/image_to_3d_lite.sh

如果你的GPU内存是16克,你可以尝试分开运行模块:

bash scripts/text_to_3d_std_separately.sh 'a lovely rabbit' ./outputs/test # >= 16G
bash scripts/text_to_3d_lite_separately.sh 'a lovely rabbit' ./outputs/test # >= 14G
bash scripts/image_to_3d_std_separately.sh ./demos/example_000.png ./outputs/test  # >= 16G
bash scripts/image_to_3d_lite_separately.sh ./demos/example_000.png ./outputs/test # >= 10G

与面包有关的

我们提供了质地烘烤模块。匹配和修改过程使用的是根据CC的NCT-SA4.0许可证许可证许可的杜斯特3R。请注意,这是一个非商业许可证,因此,该模块不能用于商业目的。

mkdir -p ./third_party/weights/DUSt3R_ViTLarge_BaseDecoder_512_dpt
huggingface-cli download naver/DUSt3R_ViTLarge_BaseDecoder_512_dpt \
    --local-dir ./third_party/weights/DUSt3R_ViTLarge_BaseDecoder_512_dpt

cd ./third_party
git clone --recursive https://github.com/naver/dust3r.git

cd ..

如果您下载相关代码和权重,我们会列出一些附加的ARG:

论点	违约	描述
`--do_bake`	假的	在网眼上烤多视图图像
`--bake_align_times`	3	图像和网眼的校准数

注意:如果您需要烘烤,请确保--do_bake 准备好True 和--do_texture_mapping 同时也会True .

python main.py ... --do_texture_mapping --do_bake (--do_render)

使用全球广播电台

我们已经准备了两种版本的多观生成,性病和LET。

# std 
python3 app.py
python3 app.py --save_memory

# lite
python3 app.py --use_lite
python3 app.py --use_lite --save_memory

然后可以通过 http://0.0.0.0:8080 .应该指出,0.0.0.0这里需要X.X.X.X与您的服务器IP。

摄影机参数

输出视图是一套固定的相机姿势:

方位角(相对于输入视图):+0, +60, +120, +180, +240, +300 .

引用

如果您发现本资料库有帮助,请参阅我们的报告:

@misc{yang2024tencent,
    title={Tencent Hunyuan3D-1.0: A Unified Framework for Text-to-3D and Image-to-3D Generation},
    author={Xianghui Yang and Huiwen Shi and Bowen Zhang and Fan Yang and Jiacheng Wang and Hongxu Zhao and Xinhai Liu and Xinzhou Wang and Qingxiang Lin and Jiaao Yu and Lifu Wang and Zhuo Chen and Sicong Liu and Yuhong Liu and Yong Yang and Di Wang and Jie Jiang and Chunchao Guo},
    year={2024},
    eprint={2411.02293},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}