Abstract
In order to help users access on-line materials with more specific questions, we build a learning portal named Fusion. First we develop FusionCrawler, a link classification focused crawler, to download potential course pages. We then use a binary classifier to pick out the course pages. After the course pages are identified, we use FusionExtractor, a DOM tree based regular expression wrapper, to extract metadata. The metadata include Course Name, Instructor Information, Course Outline, and other relevant information, and they are stored in a database behind the portal. Experimental results show that our approach to organize on-line courses based on focused crawling and metadata extraction approach is effective. The FusionCrawler got average 40-50% more on-topic learning materials than normal focused crawler, while the average F1 in FusionExtractor is 85%. With metadata of more than 1,400 MIT OCW, 3000 UIUC and 1000 WISC courses; 300 courses from GreatLearning with 3000 Chinese course videos; and nearly 1000 videos from Internet Achieve; the Fusion portal provides several kinds of searching function, like quick search, advanced search and semantic navigation browsing.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others

References
ChinaGrid GreatLearning project, http://greatlearning.grids.cn
MIT’s Open Courseware (OCW), http://ocw.mit.edu/index.html
BlackBoard, http://www.blackboard.com/
WebCT, http://www.webct.com/
Desire2Learn, http://www.desire2learn.com/
Kazi, S.: A conceptual framework for Web-based intelligent learning environments using SCORM-2004. Proceedings of the IEEE International Conference on Advanced Learning Technologies (ICALT 2004). pp. 12–15 (2004)
Chakrabarti, S., van den Berg, S.M., Dom, B.: Focused Crawling: a new approach to topic-specific Web resource discovery. In: Proceedings of the 8th World Wide Web Conference (www 1999), Toronto, Canada, 1999, pp. 1623–1640. Elsevier North-Holland, New York (1999)
Aggarwal, C., Al-Garawi, F., Yu, P.: Intelligent crawling on the World Wide Web with arbitrary predicates. In: Proceedings of the 10th international conference on World Wide Web (WWW 2001), Hong Kong, China, 2001, pp. 96–105. ACM Press, New York (2001)
Abiteboul, S., Preda, M., Cobena, G.: Adaptive On-Line Page Importance Computation. In: Proceedings of the 12th international World Wild Web Conference (WWW 2003), Budapest, Hungary, 2003, pp. 280–290. ACM Press, New York (2003)
Angkawattanawit, N., Rungsawang, A.: Learnable Crawling: An Efficient Approach to Topic-Specific web Resource Discovery. In: Proceedings of the 2nd international Symposium on communications and Information Technology (ISCIT’ 02), Bangkok, Thailand, 2002, pp. 97–114. Academic Press, London (2005)
Li, J., Furuse, K., Yamaguchi, K.: Focused Crawling by Exploiting Anchor Text Using Decision Tree. In: Proceedings of the 14th international World Wild Web Conference (WWW 2005), Chiba, Japan, 2005, pp. 1190–1191. ACM Press, New York (2005)
Diligenti, M., Coetzee, F., Lawrence, S., Giles, C.L., Gori, M.: Focused Crawling Using Context Graphs. In: Proceedings of the 26th International Conference on Very Large Data Bases (VLDB 2000), Cairo, Egypt, 2000, pp. 527–534. Morgan Kaufmann Publishers Inc, San Francisco (2000)
Kayed, M., Shaalan, K.F., Chang, C.H., Girgis, M.R.: A Survey of Web Information Extraction Systems. In: IEEE Transactions on Knowledge and Data Engineering, IEEE Educational Activities Department, Piscataway, NJ, USA, October 2006, vol. 18(10), pp. 1411–1428 (2006)
Seymore, K., McCallum, A., Rosenreid, R.: Learning Hidden Markov Model Structure for Information Extraction. In: Proceedings of AAAI-1999 Workshop on Machine Learning for Information Extraction, Orlando, Florida, 1999, pp. 37–42. AAAI Press/The MIT Press (1999)
Yin, P., Zhang, M., Deng, Z.H., Yang, D.Q.: Metadata Extraction from Bibliographies Using Bigram HMM. In: Chen, Z., Chen, H., Miao, Q., Fu, Y., Fox, E., Lim, E.-p. (eds.) ICADL 2004. LNCS, vol. 3334, pp. 310–319. Springer, Heidelberg (2004)
Lafferty, J., McCallum, A., Pereira, F.: Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In: Proceedings of International Conf. Management on Machine Learning (ICML 2001), Massachusetts, USA, pp. 282–289. Morgan Kaufmann, San Francisco (2001)
Han, H., Giles, C.L., Mnavoglu, E., Zha, H.Y.: Automatic Document Metadata Extraction Using Support Vector Machine. In: Proceedings of the Joint conference of Digital Libraries (JCDL 2003), Houston, Texas, 2003, pp. 37–48. IEEE Computer Society, Washington (2003)
LOM, WG12: Learning Object Metadata, http://ltsc.ieee.org/wg12/
China’s Ministry of Education, Discipline Classification and Code, http://fusion.grids.cn:8080/ocos/Navigation_en.jsp
Seaborne, A.: RDQL - A Query Language for RDF, W3C Member Submission, (January 9, 2004)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhang, M., Wang, W., Zhou, Y., Yang, Y., Xiong, Y., Li, X. (2008). On Line Course Organization. In: Leung, H., Li, F., Lau, R., Li, Q. (eds) Advances in Web Based Learning – ICWL 2007. ICWL 2007. Lecture Notes in Computer Science, vol 4823. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78139-4_14
Download citation
DOI: https://doi.org/10.1007/978-3-540-78139-4_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78138-7
Online ISBN: 978-3-540-78139-4
eBook Packages: Computer ScienceComputer Science (R0)