Abstract
An overview of the state-of-the-art on semantics extraction from images is presented. In this survey, we present the relevant approaches in terms of content representation as well as in terms of knowledge representation. Knowledge can be represented in either implicit or explicit fashion while the image is represented in different levels, namely, low-level, intermediate and semantic level. For each combination of knowledge and image representation, a detailed discussion is addressed that leads to fruitful conclusions for the impact of each approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Biederman, I., Mezzanotte, R.J., Rabinowitz, J.C.: Scene perception: Detecting and judging objects undergoing relational violations. Cognitive Psychology 14(2), 143–177 (1982)
Hobson, P., Kompatsiaris, Y.: Advances in semantic multimedia analysis for personalised content access. In: ISCAS (2006)
Bosch, A., Zisserman, A., Muñoz, X.: Scene classification via pLSA. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006, Part IV. LNCS, vol. 3954, pp. 517–530. Springer, Heidelberg (2006)
Sheth, A., Ramakrishnan, C., Christopher, T.: Semantics for the semantic web: The implicit, the formal and the powerful. Int. Journal on Semantic Web and Information Systems 1(1), 1–18 (2005)
Szummer, M., Picard, R.W.: Indoor-outdoor image classification. In: CAIVD, pp. 42–51 (1998)
Vailaya, A., Figueiredo, M., Jain, A., Zhang, H.: Image classification for content-based indexing. IEEE Transactions on Image Processing 10(1), 117–130 (2001)
Oliva, A., Torralba, A.B.: Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision 42(3), 145–175 (2001)
Huang, F.J., LeCun, Y.: Large-scale learning with svm and convolutional nets for generic object categorization. In: Proc. Computer Vision and Pattern Recognition Conference (CVPR 2006). IEEE Press, Los Alamitos (2006)
Chang, E., Goh, K., Sychay, G., Wu, G.: Cbsa: content-based soft annotation for multimodal image retrieval using bayes point machines. IEEE Transactions on Circuits and Systems for Video Technology 13(1), 26–38 (2003)
Pratikakis, I., Gatos, B., Thomopoulos, S.C.: Scene categorisation using low-level visual features. In: VISAPP 2006, vol. 3309, pp. 155–160 (2006)
Serrano, N., Savakis, A.E., Luo, J.: Improved scene classification using efficient low-level features and semantic cues. Pattern Recognition 37(9), 1773–1784 (2004)
Mikolajczyk, K., Schmid, C.: Indexing based on scale invariant interest points. In: International Conference on Computer Vision, vol. 1, pp. 525–531 (2001)
Vogel, J., Schiele, B.: Semantic scene modeling and retrieval for content-based image retrieval. Int. Journal of Computer Vision 72(2), 133–157 (2007)
Oliva, A., Torralba, A.: The role of context in object recognition. Trends in Cognitive Sciences 11(12), 520–527 (2007)
Bosch, A., Muñoz, X., Marti, R.: Which is the best way to organize/classify images by content? Image Vision Comput. 25(6), 778–791 (2007)
Yavlinsky, A., Schofield, E., Rüger, S.M.: Automated image annotation using global features and robust nonparametric density estimation. In: Leow, W.-K., Lew, M., Chua, T.-S., Ma, W.-Y., Chaisorn, L., Bakker, E.M. (eds.) CIVR 2005. LNCS, vol. 3568, pp. 507–517. Springer, Heidelberg (2005)
Carneiro, G., Vasconcelos, N.: Formulating semantic image annotation as a supervised learning problem. Computer Vision and Pattern Recognition, 163–168 (2005)
Westerveld, T., de Vries, A.P.: Experimental result analysis for a generative probabilistic image retrieval model. In: SIGIR, pp. 135–142 (2003)
Mori, Y., Takahashi, H., Oka, R.: Image-to-word transformation based on dividing and vector quantizing images with words. In: MISRM 1999 First International Workshop on Multimedia Intelligent Storage and Retrieval Management, Orlando, FL, USA (1999)
Zhou, X., Wang, M., Zhang, Q., Zhang, J., Shi, B.: Automatic image annotation by an iterative approach: incorporating keyword correlations and region matching. In: CIVR, pp. 25–32 (2007)
Lodhi, H., Shawe-Taylor, J., Cristianini, N., Watkins, C.J.C.H.: Text classification using string kernels. In: NIPS, pp. 563–569 (2000)
Liu, D., Tsuhan, C.: Semantic-shift for unsupervised object detection. In: Workshop on beyond patches in conjunction with CVPR, pp. 16–16 (2006)
Malik, J., Belongie, S., Shi, J., Leung, T.K.: Textons, contours and regions: Cue integration in image segmentation. In: ICCV, pp. 918–925 (1999)
Leung, T.K., Malik, J.: Representing and recognizing the visual appearance of materials using three-dimensional textons. International Journal of Computer Vision 43(1), 29–44 (2001)
Schmid, C., Mohr, R.: Combining greyvalue invariants with local constraints for object recognition. In: CVPR, pp. 872–877 (1996)
Datta, R., Li, J., Wang, J.Z.: Content-based image retrieval: approaches and trends of the new age. In: MIR 2005: Proceedings of the 7th ACM SIGMM International Workshop on Multimedia Information Retrieval, pp. 253–262. ACM, New York (2005)
Yu, J., Tian, Q., Amores, J., Sebe, N.: Toward robust distance metric analysis for similarity estimation. In: CVPR (1), pp. 316–322 (2006)
Nilsback, M.E., Zisserman, A.: A visual vocabulary for flower classification. In: CVPR (2), pp. 1447–1454 (2006)
Jurie, F., Triggs, B.: Creating efficient codebooks for visual recognition. In: ICCV (2005)
Shotton, J., Winn, J.M., Rother, C., Criminisi, A.: textonBoost: Joint appearance, shape and context modeling for multi-class object recognition and segmentation. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006, Part I. LNCS, vol. 3951, pp. 1–15. Springer, Heidelberg (2006)
Winn, J.M., Criminisi, A., Minka, T.P.: Object categorization by learned universal visual dictionary. In: ICCV, pp. 1800–1807 (2005)
Serre, T., Wolf, L., Poggio, T.: Object recognition with features inspired by visual cortex. In: CVPR (2), pp. 994–1000 (2005)
van Gemert, J.C., Geusebroek, J.M., Veenman, C.J., Snoek, C.G., Smeulders, A.W.: Robust scene categorization by learning image statistics in context. cvprw 0, 105 (2006)
Jégou, H., Douze, M., Schmid, C.: On the burstiness of visual elements. In: Conference on Computer Vision & Pattern Recognition (June 2009)
Grauman, K., Darrell, T.: The pyramid match kernel: Discriminative classification with sets of image features. In: ICCV, pp. 1458–1465 (2005)
Moosmann, F., Triggs, B., Jurie, F.: Fast discriminative visual codebooks using randomized clustering forests. In: NIPS, pp. 985–992 (2006)
Wu, J., Rehg, J.M.: Beyond the euclidean distance: Creating effective visual codebooks using the histogram intersection kernel. In: IEEE International Conference on Computer Vision (ICCV) (2009)
Kokkinos, I., Maragos, P., Yuille, A.L.: Bottom-up & top-down object detection using primal sketch features and graphical models. In: CVPR (2), pp. 1893–1900 (2006)
Csurka, G., Dance, C., Willamowski, J., Fan, L., Bray, C.: Visual categorization with bags of keypoints. In: ECCV International Workshop on Statistical Learning in Computer Vision (2004)
Lowe, D.G.: Object recognition from local scale-invariant features. In: ICCV, pp. 1150–1157 (1999)
Zhang, J., Marszalek, M., Lazebnik, S., Schmid, C.: Local features and kernels for classifcation of texture and object categories: An in-depth study. Technical Report RR-5737, INRIA, Antipolis (2005) Technical report
Zhang, H., Berg, A., Maire, M., Malik, J.: Svm-knn: Discriminative nearest neighbor classification for visual category recognition. In: CVPR, pp. 2126–2136 (2006)
Yang, C., Dong, M., Hua, J.: Region-based image annotation using asymmetrical support vector machine-based multiple-instance learning. In: CVPR (2), pp. 2057–2063 (2006)
Berg, A.C., Malik, J.: Geometric blur for template matching. In: CVPR (1), pp. 607–614 (2001)
Carneiro, G., Chan, A.B., Moreno, P.J., Vasconcelos, N.: Supervised learning of semantic classes for image annotation and retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 29(3), 394–410 (2007)
Niebles, J.C.: A hierarchical model of shape and appearance for human action classification. In: CVPR (2007)
Fergus, R., Perona, P., Zisserman, A.: Object class recognition by unsupervised scale-invariant learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 264–271 (June 2003)
Sudderth, E.B., Torralba, A.B., Freeman, W.T., Willsky, A.S.: Learning hierarchical models of scenes, objects, and parts. In: ICCV, pp. 1331–1338 (2005)
Mikolajczyk, K., Leibe, B., Schiele, B.: Multiple object class detection with a generative model. In: CVPR (1), pp. 26–36 (2006)
Amores, J., Sebe, N., Radeva, P.: Class-specific binary correlograms for object recognition. In: BMVC (2007)
Mori, G., Belongie, S., Malik, J.: Efficient shape matching using shape contexts. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(11), 1832–1837 (2005)
Amores, J., Sebe, N., Radeva, P.: Context-based object-class recognition and retrieval by generalized correlograms. IEEE Trans. Pattern Anal. Mach. Intell. 29(10), 1818–1833 (2007)
Liu, D., Hua, G., Viola, P., Chen, T.: Integrated feature selection and higher-order spatial feature extraction for object categorization. In: CVPR 2008, pp. 1–8 (2008)
Setia, L., Teynor, A., Halawani, A., Burkhardt, H.: Grayscale medical image annotation using local relational features. Pattern Recognition Letters 29(15), 2039–2045 (2008); Image CLEF 2007 - Automatic annotation of medical images for image retrieval
Savarese, S., Winn, J., Criminisi, A.: Discriminative object class models of appearance and shape by correlatons. In: CVPR 2006, vol. II, pp. 2033–2040 (2006)
Savarese, S., DelPozo, A., Niebles, J., Fei-Fei, L.: Spatial-Temporal correlatons for unsupervised action classification. In: IEEE Workshop on Motion and Video Computing, WMVC 2008, pp. 1–8 (2008)
Lazebnik, S., Schmid, C., Ponce, J.: Semi-local affine parts for object recognition. In: BMVC, pp. 959–968 (2004)
Quack, T., Ferrari, V., Leibe, B., Gool, L.J.V.: Efficient mining of frequent and distinctive feature configurations. In: ICCV, pp. 1–8 (2007)
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: CVPR (2), pp. 2169–2178 (2006)
Ling, H., Soatto, S.: Proximity distribution kernels for geometric context in category recognition. In: ICC 2007, pp. 1–8 (2007)
Bosch, A., Zisserman, A., Muñoz, X.: Representing shape with a spatial pyramid kernel. In: CIVR, pp. 401–408 (2007)
Vedaldi, A., Soatto, S.: Relaxed matching kernels for object recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2008)
Zhang, Y., Chen, T.: Efficient kernels for identifying unbounded-order spatial features, pp. 1762–1769 (2009)
Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., Watkins, C.J.C.H.: Text classification using string kernels. Journal of Machine Learning Research 2, 419–444 (2002)
Pantofaru, C., Dorko, G., Schmid, C., Hebert, M.: Combining regions and patches for object class localization. In: The Beyond Patches Workshop in conjunction with the CVPR, pp. 23–30 (June 2006)
Martin, D.R., Fowlkes, C., Tal, D., Malik, J.: A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In: ICCV, pp. 416–425 (2001)
Yang, L., Meer, P., Foran, D.: Multiple class segmentation using a unified framework over mean-shift patches. In: CVPR 2007, pp. 1–8 (2007)
Duygulu, P., Barnard, K., de Freitas, J.F.G., Forsyth, D.: Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002, Part IV. LNCS, vol. 2353, pp. 97–112. Springer, Heidelberg (2002)
Jeon, J., Lavrenko, V., Manmatha, R.: Automatic image annotation and retrieval using cross-media relevance models, pp. 119–126 (2003)
Lavrenko, V., Manmatha, R., Jeon, J.: A model for learning the semantics of pictures. In: NIPS (2003)
Feng, S., Manmatha, R., Lavrenko, V.: Multiple bernoulli relevance models for image and video annotation. In: CVPR (2), pp. 1002–1009 (2004)
Tang, J., Hare, J.S., Lewis, P.H.: Image auto-annotation using a statistical model with salient regions. In: ICME, pp. 525–528 (2006)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60(2), 91–110 (2004)
Murphy, K., Torralba, A., Freeman, W.: Using the forest to see the trees: a graphical model relating features, objects and scenes. In: NIPS (2003)
Kumar, S., Hebert, M.: Discriminative random fields: A discriminative framework for contextual interaction in classification. In: ICCV, vol. 2, pp. 1150–1157 (2003)
Fei-Fei, L., Fergus, R., Perona, P.: A bayesian hierarchical model for learning natural scene categories. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, June 20-25, vol. 2, pp. 524–531 (2005)
Fei-Fei, L., Fergus, R., Perona, P.: One-shot learning of object categories. PAMI 28(4), 594–611 (2006)
Russell, B.C., Freeman, W.T., Efros, A.A., Sivic, J., Zisserman, A.: Using multiple segmentations to discover objects and their extent in image collections. In: CVPR (2), pp. 1605–1614 (2006)
Quelhas, P., Monay, F., Odobez, J.M., Gatica-Perez, D., Tuytelaars, T., Gool, L.J.V.: Modeling scenes with local descriptors and latent aspects. In: ICCV, pp. 883–890 (2005)
Barnard, K., Duygulu, P., Forsyth, D.A., de Freitas, N., Blei, D.M., Jordan, M.I.: Matching words and pictures. Journal of Machine Learning Research 3, 1107–1135 (2003)
Blei, D., Jordan, M.: Modeling annotated data. In: SIGIR 2003. ACM, New York (2003)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. Journal of Machine Learning Research 3, 993–1022 (2003)
Magalhães, J., Rüger, S.M.: Logistic regression of generic codebooks for semantic image retrieval. In: Sundaram, H., Naphade, M., Smith, J.R., Rui, Y. (eds.) CIVR 2006. LNCS, vol. 4071, pp. 41–50. Springer, Heidelberg (2006)
Boutell, M.R., Luo, J., Brown, C.M.: Factor graphs for region-based whole-scene classification. In: CVPR Workshop on Semantic Learning Applications in Multimedia, p. 104 (2006)
Fan, J., Gao, Y., Luo, H., Xu, G.: Statistical modeling and conceptualization of natural images. Pattern Recognition 38(6), 865–885 (2005)
Luo, J., Savakis, A.E., Singhal, A.: A bayesian network-based framework for semantic image understanding. Pattern Recognition 38(6), 919–934 (2005)
Monay, F., Quelhas, P., Odobez, J.M., Gatica-Perez, D.: Integrating co-occurrence and spatial contexts on patchbased scene segmentation. In: CVPR Workshop on Beyond patches. IEEE Computer Society, Los Alamitos (2006)
Wang, G., Zhang, Y., Fei-Fei, L.: Using dependent regions for object categorization in a generative framework. In: CVPR 2006, New York, NY, USA, June 17-22, vol. 2, pp. 1597–1604 (2006)
Shi, J., Malik, J.: Normalized cuts and image segmentation. In: CVPR, pp. 731–737 (1997)
Cao, L., Fei-Fei, L.: Spatially coherent latent topic model for concurrent object segmentation and classification. In: Proceedings of IEEE Intern. Conf. in Computer Vision (ICCV) (2007)
Singhal, A., Luo, J., Zhu, W.: Probabilistic spatial context models for scene content understanding. In: CVPR (1), pp. 235–241 (2003)
Kschischang, F.R., Frey, B.J., Loeliger, H.A.: Factor graphs and the sum-product algorithm. IEEE Transactions on Information Theory 47(2), 498–519 (2001)
Neches, R., Fikes, R., Finin, T.W., Gruber, T.R., Patil, R.S., Senator, T.E., Swartout, W.R.: Enabling technology for knowledge sharing. AI Magazine 12(3), 36–56 (1991)
Chandrasekaran, B., Josephson, J.R., Benjamins, V.R.: What are ontologies, and why do we need them? IEEE Intelligent Systems 14(1), 20–26 (1999)
Gruber, T.: Towards principles for the design of ontologies used for knowledge sharing. International Journal for Human-Computer Studies 43, 907–928 (1995)
Guarino, N., Giaretta, P.: Ontologies and knowledge bases: Towards a terminological clarification. In: Towards Very Large Knowledge Bases: Knowledge Building and Knowledge Sharing, pp. 25–32 (1995)
Towards text recognition in natural scene images. In: SWAMM 2006, collocated with WWW 2006, Edinburgh, Scotland (2006)
Mezaris, V., Kompatsiaris, I., Strintzis, M.G.: Region-based image retrieval using an object ontology and relevance feedback. EURASIP J. Appl. Signal Process. 2004(1), 886–901 (2004)
Dasiopoulou, S., Mezaris, V., Kompatsiaris, I., Papastathis, V.K., Strintzis, M.G.: Knowledge-assisted semantic video object detection. IEEE Trans. Circuits Syst. Video Techn. 15(10), 1210–1224 (2005)
Maillot, N.: Ontology-based object learning and recognition. PhD thesis, ORION / INRIA Sophia-Antipolis (December 2005)
Hudelot, C., Maillot, N., Thonnat, M.: Symbol grounding for semantic image interpretation: from image data to semantics. In: ICCV, Workshop on Semantic Knowledge in Computer Vision (2005)
Town, C., Sinclair, D.: Language-based querying of image collections on the basis of an extensible ontology. Image Vision Comput. 22(3), 251–267 (2004)
Leibe, B., Schiele, B.: Interleaved object categorization and segmentation. In: BMVC (September 2003)
Harris, C., Stephens, M.: A combined corner and edge detector. In: 4th ALVEY Vision Conference, pp. 147–151 (1988)
Agarwal, S., Roth, D.: Learning a sparse representation for object detection. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002, Part IV. LNCS, vol. 2353, pp. 113–127. Springer, Heidelberg (2002)
Borenstein, E., Ullman, S.: Class-specific, top-down segmentation. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2351, pp. 109–122. Springer, Heidelberg (2002)
Farmer, M.E., Jain, A.K.: A wrapper-based approach to image segmentation and classification. In: ICPR (2), pp. 106–109 (2004)
Kokkinos, I., Maragos, P.: An expectation maximization approach to the synergy between image segmentation and object categorization. In: ICCV, pp. 617–624 (2005)
Felzenszwalb, P.F., Huttenlocher, D.P.: Efficient matching of pictorial structures. In: CVPR, p. 2066 (2000)
Leibe, B., Leonardis, A., Schiele, B.: Combined object categorization and segmentation with an implicit shape model. In: ECCV 2004 Workshop on Statistical Learning in Computer Vision (May 2004)
Jones, M., Poggio, T.: Multidimensional morphable models: A framework for representing and matching object classes. International Journal of Computer Vision 29, 107–131(25) (1998)
Cootes, T., Taylor, C.: Statistical models of appearance for computer vision. Technical report, University of Manchester, Wolfson Image Analysis Unit, Imaging Science and Biomedical Engineering, Manchester M13 9PT, United Kingdom (September 1999), http://www.wiau.man.ac.uk
Borenstein, E., Malik, J.: Shape guided object segmentation. In: CVPR 2006, pp. 969–976. IEEE Computer Society, Los Alamitos (2006)
Galun, M., Sharon, E., Basri, R., Brandt, A.: Texture segmentation by multiscale aggregation of filter responses and shape elements. In: ICCV, pp. 716–723 (2003)
Ferran, C., Giró, X., Marqués, F., Casas, J.R.: BPT enhancement based on syntactic and semantic criteria. In: Avrithis, Y., Kompatsiaris, Y., Staab, S., O’Connor, N.E. (eds.) SAMT 2006. LNCS, vol. 4306, pp. 184–198. Springer, Heidelberg (2006)
Salembier, P., Garrido, L.: Binary partition tree as an efficient representation for image processing, segmentation, and information retrieval. IEEE Transactions on Image Processing 9(4), 561–576 (2000)
Papadopoulos, G.T., Mezaris, V., Dasiopoulou, S., Kompatsiaris, I.: Semantic image analysis using a learning approach and spatial context. In: Avrithis, Y., Kompatsiaris, Y., Staab, S., O’Connor, N.E. (eds.) SAMT 2006. LNCS, vol. 4306, pp. 199–211. Springer, Heidelberg (2006)
Skiadopoulos, S., Giannoukos, C., Sarkas, N., Vassiliadis, P., Sellis, T., Koubarakis, M.: 2d topological and direction relations in the world of minimum bounding circles. IEEE Transactions on Knowledge and Data Engineering 17(12), 1610–1623 (2005)
Wang, Y., Makedon, F., Ford, J., Shen, L., Goldin, D.Q.: Generating fuzzy semantic metadata describing spatial relations from images using the r-histogram. In: JCDL 2004, Tuscon, AZ, USA, June 7-11, pp. 202–211 (2004)
Sikora, T.: The mpeg-7 visual standard for content description-an overview. IEEE Trans. Circuits Syst. Video Techn. 11(6), 696–702 (2001)
Athanasiadis, T., Mylonas, P., Avrithis, Y.S.: A context-based region labeling approach for semantic image segmentation. In: Avrithis, Y., Kompatsiaris, Y., Staab, S., O’Connor, N.E. (eds.) SAMT 2006. LNCS, vol. 4306, pp. 212–225. Springer, Heidelberg (2006)
Athanasiadis, T., Tzouvaras, V., Petridis, K., Precioso, F., Avrithis, Y., Kompatsiaris, Y.: Using a multimedia ontology infrastructure for semantic annotation of multimedia content. In: Proc. of 5th International Workshop on Knowledge Markup and Semantic Annotation (SemAnnot 2005), Galway, Ireland (November 2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Pratikakis, I., Bolovinou, A., Gatos, B., Perantonis, S. (2011). Semantics Extraction from Images. In: Paliouras, G., Spyropoulos, C.D., Tsatsaronis, G. (eds) Knowledge-Driven Multimedia Information Extraction and Ontology Evolution. Lecture Notes in Computer Science(), vol 6050. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20795-2_3
Download citation
DOI: https://doi.org/10.1007/978-3-642-20795-2_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20794-5
Online ISBN: 978-3-642-20795-2
eBook Packages: Computer ScienceComputer Science (R0)