Click Carving: Interactive Object Segmentation in Images and Videos with Point Clicks

Jain, Suyog Dutt; Grauman, Kristen

doi:10.1007/s11263-019-01184-2

Click Carving: Interactive Object Segmentation in Images and Videos with Point Clicks

Published: 27 May 2019

Volume 127, pages 1321–1344, (2019)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Suyog Dutt Jain¹ &
Kristen Grauman¹

963 Accesses
9 Citations
Explore all metrics

Abstract

We present a novel form of interactive object segmentation called Click Carving which enables accurate segmentation of objects in images and videos with only a few point clicks. Whereas conventional interactive pipelines take the user’s initialization as a starting point, we show the value in the system taking lead even in initialization. In particular, for a given image or a video frame, the system precomputes a ranked list of thousands of possible segmentation hypotheses (also referred to as object region proposals) using appearance and motion cues. Then, the user looks at the top ranked proposals, and clicks on the object boundary to carve away erroneous ones. This process iterates (typically 2–3 times), and each time the system revises the top ranked proposal set, until the user is satisfied with a resulting segmentation mask. In the case of images, this mask is considered as the final object segmentation. However in the case of videos, the object region proposals rely on motion as well, and the resulting segmentation mask in the first frame is further propagated across the video to obtain a complete spatio-temporal object tube. On six challenging image and video datasets, we provide extensive comparisons with both existing work and simpler alternative methods. In all, the proposed Click Carving approach strikes an excellent of accuracy and human effort. It outperforms all similarly fast methods, and is competitive or better than those requiring 2–12 times the effort.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 5

Point-Cut: Interactive Image Segmentation Using Point Supervision

ScribbleBox: Interactive Annotation Framework for Video Object Segmentation

Robust Interactive Multi-label Segmentation with an Advanced Edge Detector

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

More details and videos can be found at: http://vision.cs.utexas.edu/projects/clickcarving/.
Code available at: http://vision.cs.utexas.edu/projects/clickcarving/.
The unsupervised NLC method (Faktor and Irani 2014) reports excellent results on a subset of the Segtrack-v2 dataset; the method achieves state of the art results for that subset. We were unable to reproduce the results using the publicly available NLC code, potentially because of an OS incompatibility.
IVID (Shankar Nagaraja et al. 2015) does not report annotation times for Segtrack-v2. Also, VSB100 dataset wasn’t used in their experiments.
More details and videos can be found at: http://vision.cs.utexas.edu/projects/clickcarving/.

References

Acuna, D., Ling, H., Kar, A., & Fidler, S. (2018). Efficient interactive annotation of segmentation datasets with polygon-rnn++.
Arbeláez, P., Pont-Tuset, J., Barron, J., Marques, F., & Malik, J. (2014). Multiscale combinatorial grouping. In CVPR.
Badrinarayanan, V., Galasso, F., & Cipolla, R. (2010). Label propagation in video sequences. In CVPR.
Bai, X., & Sapiro, G. (2007). Distancecut: Interactive segmentation and matting of images and videos. In 2007 IEEE international conference on image processing.
Bai, X., Wang, J., Simons, D., & Sapiro, G. (2009) Video snapcut: Robust video object cutout using localized classifiers. In SIGGRAPH.
Batra, D., Kowdle, A., Parikh, D., Luo, J., & Chen, T. (2010). iCoseg: Interactive co-segmentation with intelligent scribble guidance. In CVPR.
Bearman, A., Russakovsky, O., Ferrari, V., & Fei-Fei, L. (2015). What’s the point: Semantic segmentation with point supervision. ArXiv e-prints.
Bell, S., Upchurch, P., Snavely, N., & Bala, K. (2015). Material recognition in the wild with the materials in context database. In Computer Vision and Pattern Recognition (CVPR).
Boykov, Y., & Jolly, M. (2001). Interactive graph cuts for optimal boundary and region segmentation of objects in N-D images. In CVPR.
Carreira, J., & Sminchisescu, C. (2012). CPMC: Automatic object segmentation using constrained parametric min-cuts. PAMI, 34(7), 1312–1328.
Article Google Scholar
Castrejón, L., Kundu, K., Urtasun, R., & Fidler, S. (2017). Annotating object instances with a polygon-rnn. In CVPR.
Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2015). Semantic image segmentation with deep convolutional nets and fully connected crfs. In ICLR.
Cheng, M.-M., Zhang, G.-X., Mitra, N. J., Huang, X., & Hu, S.-M. (2011). Global contrast based salient region detection. In CVPR (pp. 409–416).
Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., & Schiele, B. (2016). The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR).
Faktor, A., & Irani, M. (2014). Video segmentation by non-local consensus voting. In Proceedings of the British machine vision conference. BMVA Press.
Fathi, A., Balcan, M., Ren, X., & Rehg, J. (2011). Combining self training and active learning for video segmentation. In BMVC.
Fragkiadaki, K., Arbelaez, P., Felsen, P., & Malik, J. (2015). Learning to segment moving objects in videos. In CVPR.
Galasso, F., Nagaraja, N. S., Cardenas, T. J., Brox, T., & Schiele, B. (2013). A unified video segmentation benchmark: Annotation, metrics and analysis. In ICCV.
Godec, M., Roth, P. M., & Bischof, H. (2011). Hough-based tracking of non-rigid objects. In ICCV.
Grundmann, M., Kwatra, V., Han, M., & Essa, I. (2010). Efficient hierarchical graph based video segmentation. In CVPR.
Gulshan, V., Rother, C., Criminisi, A., Blake, A., & Zisserman, A. (2010). Geodesic star convexity for interactive image segmentation. In CVPR.
Jain, S., & Grauman, K. (2013). Predicting sufficient annotation strength for interactive foreground segmentation. In ICCV.
Jain, S. D., & Grauman, K. (2014). Supervoxel-consistent foreground propagation in video. In ECCV 2014. Lecture notes in computer science (pp. 656–671). Springer.
Jain, S. D., & Grauman, K. (2016). Click carving: Segmenting objects in video with point clicks. In AAAI conference on human computation and crowdsourcing (HCOMP).
Jiang, B., Zhang, L., Lu, H., Yang, C., & Yang, M.-H. (2013). Saliency detection via absorbing markov chain. In ICCV.
Karasev, V., Ravichandran, A., & Soatto, S. (2014). Active frame, location, and detector selection for automated and manual video annotation. In Proceedings of the IEEE conference on computer vision and pattern recognition.
Kass, M., Witkin, A., & Terzopoulos, D. (1988). Snakes: Active contour models. In IJCV (pp. 321–331).
Kohli, P., Nickisch, H., Rother, C., & Rhemann, C. (2012). User-centric learning and evaluation of interactive segmentation systems. IJCV, 100(3), 261–274.
Article MathSciNet Google Scholar
Krähenbühl, P., & Koltun, V. (2014). In Computer vision—ECCV 2014: 13th European conference, Zurich, Switzerland, September 6–12, 2014, proceedings, part V, chapter geodesic object proposals (pp. 725–739). Cham: Springer.
Krause, A., & Guestrin, C. (2007). Near-optimal observation selection using submodular functions. In National conference on artificial intelligence (AAAI), nectar track.
Lee, Y. J., Kim, J., & Grauman, K. (2011). Key-segments for video object segmentation. In ICCV.
Lempitsky, V. S., Kohli, P., Rother, C., & Sharp, T. (2009). Image segmentation with a bounding box prior. In ICCV
Levinkov, E., Tompkin, J., Bonneel, N., Kirchhoff, S., Andres, B., & Pfister, H. (2016). Interactive multicut video segmentation. In Proceedings of the 24th Pacific conference on computer graphics and applications: Short papers (pp. 33–38).
Li, F., Kim, T., Humayun, A., Tsai, D., & Rehg, J. M. (2013). Video segmentation by tracking many figure-ground segments. In ICCV.
Li, X., Zhao, L., Wei, L., Yang, M.-H., Fei, W., Zhuang, Y., et al. (2016). DeepSaliency: Multi-task deep neural network model for salient object detection. IEEE TIP, 25(8), 3919–3930.
MathSciNet MATH Google Scholar
Li, Y., Hou, X., Koch, C., Rehg, J. M., & Yuille, A. L. (2014). The secrets of salient object segmentation. In CVPR.
Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft COCO: Common objects in context. In ECCV.
Liu, T., Yuan, Z., Sun, J., Wang, J., Zheng, N., Tang, X., et al. (2011). Learning to detect a salient object. PAMI, 33(2), 353–367.
Article Google Scholar
Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In CVPR.
Ma, T., & Latecki, L. (2012). Maximum weight cliques with mutex constraints for video object segmentation. In CVPR.
Malisiewicz, T., & Efros, A. A. (2007). Spatial support for objects via multiple segmentations. In BMVC.
Malmberg, F., Strand, R., & Nyström, I. (2011). Generalized hard constraints for graph segmentation. In SCIA.
McGuinness, K., & O’Connor, N. E. (2010). A comparative evaluation of interactive segmentation algorithms. Pattern Recognition, 43(2), 434–444. Interactive Imaging and Vision.
Article MATH Google Scholar
Mortensen, E., & Barrett, W. (1995). Intelligent scissors for image composition. In SIGGRAPH.
Nickisch, H., Rother, C., Kohli, P., & Rhemann, C. (2010). Learning an interactive segmentation system. In Proceedings of the seventh Indian conference on computer vision, graphics and image processing, ICVGIP ’10 (pp. 274–281). New York, NY: ACM.
Noh, H., Hong, S., & Han, B. (2015). Learning deconvolution network for semantic segmentation. In 2015 IEEE international conference on computer vision (ICCV).
Oneata, D., Revaud, J., Verbeek, J., & Schmid, C. (2014). Spatio-temporal object detection proposals. In ECCV.
Papadopoulos, D., Uijlings, J., Keller, F., & Ferrari, V. (2017). Training object class detectors with click supervision. In CVPR.
Papazoglou, A., & Ferrari, V. (2013). Fast object segmentation in unconstrained video. In ICCV.
Perazzi, F., Krähenbühl, P., Pritch, Y., & Hornung, A. (2012). Saliency filters: Contrast based filtering for salient region detection. In CVPR (pp. 733–740).
Pinheiro, P. O., Collobert, R., & Dollár, P. (2015). Learning to segment object candidates. In NIPS
Pont-Tuset, J., Farré, M. A., & Smolic, A. (2015). Semi-automatic video object segmentation by advanced manipulation of segmentation hierarchies. In International workshop on content-based multimedia indexing (CBMI).
Ren, X., & Malik, J. (2007). Tracking as repeated figure/ground segmentation. In CVPR.
Rother, C., Kolmogorov, V., & Blake, A. (2004). Grabcut-interactive foreground extraction using iterated graph cuts. In SIGGRAPH.
Russakovsky, O., Li, L.-J., & Fei-Fei, L. (2015). Best of both worlds: Human–machine collaboration for object annotation. In CVPR.
Shankar Nagaraja, N., Schmidt, F. R., & Brox, T. (2015). Video segmentation with just a few strokes. In ICCV.
Sundberg, P., Brox, T., Maire, M., Arbelaez, P., & Malik, J. (2011). Occlusion boundary detection and figure/ground assignment from optical flow. In CVPR, Washington, DC, USA.
Tsai, D., Flagg, M., & Rehg, J. (2010). Motion coherent tracking with multi-label mrf optimization. In BMVC.
The OpenCV reference manual, 2.4.9.0 edition, April 2014.
Uijlings, J. R. R., van de Sande, K. E. A., Gevers, T., & Smeulders, A. W. M. (2013). Selective search for object recognition. International Journal of Computer Vision, 104(2), 154–171.
Article Google Scholar
Vijayanarasimhan, S., & Grauman, K. (2012). Active frame selection for label propagation in videos. In ECCV.
Vondrick, C., & Ramanan, D. (2011). Video annotation and tracking with active learning. In NIPS.
Wang, J., Bhat, P., Colburn, A., Agrawala, M., & Cohen, M. F. (2005). Interactive video cutout. ACM Transactions on Graphics, 24(3), 585–594.
Article Google Scholar
Wang, T., Han, B., & Collomosse, J. (2014). Touchcut: Fast image and video segmentation using single-touch interaction. Computer Vision and Image Understanding, 120, 14–30.
Article Google Scholar
Weinzaepfel, P., Revaud, J., Harchaoui, Z., & Schmid, C. (2015). Learning to detect motion boundaries. In CVPR 2015, Boston, United States.
Wen, L., Du, D., Lei, Z., Li, S. Z., & Yang, M.-H. (2015). Jots: Joint online tracking and segmentation. In CVPR.
Wu, Z., Li, F., Sukthankar, R., & Rehg, J. M. (2015). Robust video segment proposals with painless occlusion handling. In CVPR.
Xu, N., Price, B. L., Cohen, S., Yang, J., & Huang, T. S. (2016). Deep interactive object selection. CVPR (pp. 373–381).
Yu, G., & Yuan, J. (2015). Fast action proposals for human action detection and search. In CVPR.
Zhang, D., Javed, O., & Shah, M. (2013). Video object segmentation through spatially accurate and temporally dense extraction of primary object regions. In CVPR.
Zhao, R., Ouyang, W., Li, H., & Wang, X. (2015). Saliency detection by multi-context learning. In CVPR.
Zheng, S., Jayasumana, S., Romera-Paredes, B., Vineet, V., Su, Z., Du, D., et al. (2015). Conditional random fields as recurrent neural networks.

Download references

Acknowledgements

This research is supported in part by ONR PECASE N00014-15-1-2291, NSF IIS-1514118, a gift from Qualcomm and a gift from AWS Machine Learning. We would like to thank Shankar Nagaraja for providing the iVideoseg dataset timing data. We also thank all the participants in our user studies.

Author information

Authors and Affiliations

The University of Texas at Austin, Austin, USA
Suyog Dutt Jain & Kristen Grauman

Authors

Suyog Dutt Jain
View author publications
You can also search for this author inPubMed Google Scholar
Kristen Grauman
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Suyog Dutt Jain.

Additional information

Communicated by Jakob Verbeek.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jain, S.D., Grauman, K. Click Carving: Interactive Object Segmentation in Images and Videos with Point Clicks. Int J Comput Vis 127, 1321–1344 (2019). https://doi.org/10.1007/s11263-019-01184-2

Download citation

Received: 27 December 2017
Accepted: 08 May 2019
Published: 27 May 2019
Issue Date: 01 September 2019
DOI: https://doi.org/10.1007/s11263-019-01184-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Click Carving: Interactive Object Segmentation in Images and Videos with Point Clicks

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Point-Cut: Interactive Image Segmentation Using Point Supervision

ScribbleBox: Interactive Annotation Framework for Video Object Segmentation

Robust Interactive Multi-label Segmentation with an Advanced Edge Detector

Explore related subjects

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now