Neural-Sim: Learning to Generate Training Data with NeRF

Ge, Yunhao; Behl, Harkirat; Xu, Jiashu; Gunasekar, Suriya; Joshi, Neel; Song, Yale; Wang, Xin; Itti, Laurent; Vineet, Vibhav

doi:10.1007/978-3-031-20050-2_28

Yunhao Ge¹²,
Harkirat Behl¹³,
Jiashu Xu¹²,
Suriya Gunasekar¹³,
Neel Joshi¹³,
Yale Song¹³,
Xin Wang¹³,
Laurent Itti¹² &
…
Vibhav Vineet¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13683))

Included in the following conference series:

European Conference on Computer Vision

2995 Accesses

Abstract

Training computer vision models usually requires collecting and labeling vast amounts of imagery under a diverse set of scene configurations and properties. This process is incredibly time-consuming, and it is challenging to ensure that the captured data distribution maps well to the target domain of an application scenario. Recently, synthetic data has emerged as a way to address both of these issues. However, existing approaches either require human experts to manually tune each scene property or use automatic methods that provide little to no control; this requires rendering large amounts of random data variations, which is slow and is often suboptimal for the target domain. We present the first fully differentiable synthetic data pipeline that uses Neural Radiance Fields (NeRFs) in a closed-loop with a target application’s loss function. Our approach generates data on-demand, with no human labor, to maximize accuracy for a target task. We illustrate the effectiveness of our method on synthetic and real-world object detection tasks. We also introduce a new “YCB-in-the-Wild” dataset and benchmark that provides a test scenario for object detection with varied poses in real-world environments. Code and data could be found at .

H. Behl and J. Xu—Equal contribution as second author.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Modeling Camera Effects to Improve Visual Learning from Synthetic Data

What Makes Good Synthetic Training Data for Learning Disparity and Optical Flow Estimation?

Article 02 April 2018

BlendTorch: A Real-Time, Adaptive Domain Randomization Library

Notes

1.
For simplicity, we have dropped the dependence of loss $\ell $ on labels y.

References

Jahanian, A., Lucy Chai, P.I.: On the "steerability" of generative adversarial networks. CoRR (2019)
Google Scholar
Barbu, A., et al.: Objectnet: A large-scale bias-controlled dataset for pushing the limits of object recognition models. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems. vol. 32. Curran Associates, Inc. (2019), https://proceedings.neurips.cc/paper/2019/file/97af07a14cacba681feacf3012730892-Paper.pdf
Barbu, A., et al.: Objectnet: A large-scale bias-controlled dataset for pushing the limits of object recognition models. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Google Scholar
Behl, H.S., Baydin, A.G., Gal, R., Torr, P.H.S., Vineet, V.: Autosimulate: (quickly) learning synthetic data generation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12367, pp. 255–271. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58542-6_16
Chapter Google Scholar
Bi, S., et al.: Neural reflectance fields for appearance acquisition. arXiv preprint arXiv:2008.03824 (2020)
Brock, A., Donahue, J., Simonyan, K.: Large scale GAN training for high fidelity natural image synthesis. In: International Conference on Learning Representations (2019). https://openreview.net/forum?id=B1xsqj09Fm
Calli, B., Walsman, A., Singh, A., Srinivasa, S., Abbeel, P., Dollar, A.M.: Benchmarking in manipulation research: The ycb object and model set and benchmarking protocols. arXiv preprint arXiv:1502.03143 (2015)
Colson, B., Marcotte, P., Savard, G.: An overview of bilevel optimization. Ann. Oper. Res. 153(1), 235–256 (2007)
Article MathSciNet MATH Google Scholar
Danilo Jimenez Rezende, S.M.: Variational inference with normalizing flows. In: ICML (2015)
Google Scholar
Denninger, M., et al.: Blenderproc. arXiv preprint arXiv:1911.01911 (2019)
Devaranjan, J., Kar, A., Fidler, S.: Meta-Sim2: Unsupervised learning of scene structure for synthetic data generation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12362, pp. 715–733. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58520-4_42
Chapter Google Scholar
Diederik Kingma, M.W.: Autoencoding variational bayes. In: ICLR (2014)
Google Scholar
Doersch, C., Zisserman, A.: Sim2real transfer learning for 3d human pose estimation: motion to the rescue. In: NeurIPS (2019)
Google Scholar
Dwibedi, D., Misra, I., Hebert, M.: Cut, paste and learn: Surprisingly easy synthesis for instance detection. In: ICCV (2017)
Google Scholar
Franceschi, L., Frasconi, P., Salzo, S., Grazzi, R., Pontil, M.: Bilevel programming for hyperparameter optimization and meta-learning. In: International Conference on Machine Learning, pp. 1568–1577. PMLR (2018)
Google Scholar
Gafni, G., Thies, J., Zollhofer, M., Nießner, M.: Dynamic neural radiance fields for monocular 4d facial avatar reconstruction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8649–8658 (2021)
Google Scholar
Ganin, Y., Kulkarni, T., Babuschkin, I., Eslami, S.M.A., Vinyals, O.: Synthesizing programs for images using reinforced adversarial learning. In: ICML (2018)
Google Scholar
Ge, Y., Abu-El-Haija, S., Xin, G., Itti, L.: Zero-shot synthesis with group-supervised learning. arXiv preprint arXiv:2009.06586 (2020)
Ge, Y., Xu, J., Zhao, B.N., Itti, L., Vineet, V.: Dall-e for detection: Language-driven context image synthesis for object detection. arXiv preprint arXiv:2206.09592 (2022)
Ge, Y., Zhao, J., Itti, L.: Pose augmentation: Class-agnostic object pose transformation for object recognition. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12373, pp. 138–155. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58604-1_9
Chapter Google Scholar
Georgiev, I., et al.: Arnold: A brute-force production path tracer. TOG 37, 1–12 (2018)
Article Google Scholar
Goodfellow, I., et al.: Generative adversarial nets. In: Advances in neural information processing systems, pp. 2672–2680 (2014)
Google Scholar
Handa, A., Patraucean, V., Badrinarayanan, V., Stent, S., Cipolla, R.: Understanding real world indoor scenes with synthetic data. In: CVPR (2016)
Google Scholar
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: ICCV (2017)
Google Scholar
Higgins, I., et al.: beta-vae: Learning basic visual concepts with a constrained variational framework. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France (2017)
Google Scholar
Hodaň, T., et al.: BOP: Benchmark for 6d object pose estimation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11214, pp. 19–35. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01249-6_2
Chapter Google Scholar
Hodaň, T., et al.: Photorealistic image synthesis for object instance detection. In: ICIP (2019)
Google Scholar
Ilg, E., Mayer, N., Saikia, T., Keuper, M., Dosovitskiy, A., Brox, T.: Flownet 2.0: Evolution of optical flow estimation with deep networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, 21–26 July 2017, pp. 1647–1655 (2017). https://doi.org/10.1109/CVPR.2017.179
Jang, W., Agapito, L.: Codenerf: Disentangled neural radiance fields for object categories. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12949–12958 (2021)
Google Scholar
Kar, A., et al.: Meta-sim: Learning to generate synthetic datasets. In: ICCV (2019)
Google Scholar
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)
Google Scholar
Louppe, G., Cranmer, K.: Adversarial variational optimization of non-differentiable simulators. In: AISTATS (2019)
Google Scholar
Martin-Brualla, R., Radwan, N., Sajjadi, M.S., Barron, J.T., Dosovitskiy, A., Duckworth, D.: Nerf in the wild: Neural radiance fields for unconstrained photo collections. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7210–7219 (2021)
Google Scholar
Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: NeRF: Representing scenes as neural radiance fields for view synthesis. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 405–421. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_24
Chapter Google Scholar
Ng, A.: Mlops: From model-centric to data-centric ai. https://www.deeplearning.ai/wp-content/uploads/2021/06/MLOps-From-Model-centric-to-Data-centric-AI.pdf
Park, K., et al.: Nerfies: Deformable neural radiance fields. In: ICCV (2021)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. In: PAMI (2017)
Google Scholar
Richter, S.R., Hayder, Z., Koltun, V.: Playing for benchmarks. In: ICCV (2017)
Google Scholar
Richter, S.R., Vineet, V., Roth, S., Koltun, V.: Playing for data: Ground truth from computer games. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 102–118. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46475-6_7
Chapter Google Scholar
Ros, G., Sellart, L., Materzynska, J., Vázquez, D., López, A.M.: The SYNTHIA dataset: A large collection of synthetic images for semantic segmentation of urban scenes. In: CVPR (2016)
Google Scholar
Ruiz, N., Schulter, S., Chandraker, M.: Learning to simulate. In: ICLR (2019)
Google Scholar
Shelhamer, E., Long, J., Darrell, T.: Fully convolutional networks for semantic segmentation. In: PAMI (2017)
Google Scholar
Srinivasan, P.P., Deng, B., Zhang, X., Tancik, M., Mildenhall, B., Barron, J.T.: Nerv: Neural reflectance and visibility fields for relighting and view synthesis. In: CVPR (2021)
Google Scholar
Tremblay, J., To, T., Birchfield, S.: Falling things: A synthetic dataset for 3d object detection and pose estimation. In: CVPR (2018)
Google Scholar
Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 8, 229–256 (1992)
Article MATH Google Scholar
Xiang, Y., Schmidt, T., Narayanan, V., Fox, D.: Posecnn: A convolutional neural network for 6d object pose estimation in cluttered scenes. arXiv preprint arXiv:1711.00199 (2017)
Xiaogang, X.u., Ying-Cong Chen, J.J.: View independent generative adversarial network for novel view synthesis. In: ICCV (2019)
Google Scholar
Yang, D., Deng, J.: Learning to generate synthetic 3d training data through hybrid gradient. In: CVPR (2020)
Google Scholar
Yen-Chen, L.: Nerf-pytorch. https://github.com/yenchenlin/nerf-pytorch/ (2020)
Zhang, X., Srinivasan, P.P., Deng, B., Debevec, P., Freeman, W.T., Barron, J.T.: Nerfactor: Neural factorization of shape and reflectance under an unknown illumination. ACM Trans. Graph. (TOG) 40(6), 1–18 (2021)
Article Google Scholar
Zhang, Y., et al.: Physically-based rendering for indoor scene understanding using convolutional neural networks. In: CVPR (2017)
Google Scholar

Download references

Acknowledgments

We thank Yen-Chen Lin for help on using the nerf-pytorch code. This work was supported in part by C-BRIC (one of six centers in JUMP, a Semiconductor Research Corporation (SRC) program sponsored by DARPA), DARPA (HR00112190134) and the Army Research Office (W911NF2020053). The authors affirm that the views expressed herein are solely their own, and do not represent the views of the United States government or any agency thereof.

Author information

Authors and Affiliations

University of Southern California, California, USA
Yunhao Ge, Jiashu Xu & Laurent Itti
Microsoft Research, Washington, USA
Harkirat Behl, Suriya Gunasekar, Neel Joshi, Yale Song, Xin Wang & Vibhav Vineet

Authors

Yunhao Ge
View author publications
You can also search for this author in PubMed Google Scholar
Harkirat Behl
View author publications
You can also search for this author in PubMed Google Scholar
Jiashu Xu
View author publications
You can also search for this author in PubMed Google Scholar
Suriya Gunasekar
View author publications
You can also search for this author in PubMed Google Scholar
Neel Joshi
View author publications
You can also search for this author in PubMed Google Scholar
Yale Song
View author publications
You can also search for this author in PubMed Google Scholar
Xin Wang
View author publications
You can also search for this author in PubMed Google Scholar
Laurent Itti
View author publications
You can also search for this author in PubMed Google Scholar
Vibhav Vineet
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yunhao Ge .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Shai Avidan
University College London, London, UK
Gabriel Brostow
Google AI, Accra, Ghana
Moustapha Cissé
University of Catania, Catania, Italy
Giovanni Maria Farinella
Facebook (United States), Menlo Park, CA, USA
Tal Hassner

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 5556 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ge, Y. et al. (2022). Neural-Sim: Learning to Generate Training Data with NeRF. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13683. Springer, Cham. https://doi.org/10.1007/978-3-031-20050-2_28

Download citation

DOI: https://doi.org/10.1007/978-3-031-20050-2_28
Published: 28 October 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20049-6
Online ISBN: 978-3-031-20050-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Neural-Sim: Learning to Generate Training Data with NeRF