Skip to main content

A First Experience in Archiving the French Web

  • Conference paper
  • First Online:
Research and Advanced Technology for Digital Libraries (ECDL 2002)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2458))

Included in the following conference series:

Abstract

The web is a more and more valuable source of information and organizations are involved in archiving (portions of) it for various purposes, e.g., the Internet Archive www.archive.org. A new mission of the French National Library (BnF) is the “dépôt légal” (legal deposit) of the French web. We describe here some preliminary work on the topic conducted by BnF and INRIA. In particular, we consider the acquisition of the web archive. Issues are the definition of the perimeter of the French web and the choice of pages to read once or more times (to take changes into account). When several copies of the same page are kept, this leads to versioning issues that we briefly consider. Finally, we mention some first experiments.

This was a decision of King FranÇois the 1st.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. S. Abiteboul, M. Preda, and G. Cobena. Computing web page importance without storing the graph of the web (extended abstract). In IEEE Data Engineering Bulletin, Volume 25, 2002.

    Google Scholar 

  2. A. Arvidson, K. Persson, and J. Mannerheim. The kulturarw3 project— the royal swedish web archiw3e— an example of “complete” collection of web pages. In 66th IFLA Council andGener al Conference, 2000. http://www.i.a.org/IV/i.a66/papers/154-157e.htm.

  3. M.K. Bergman. The deep web: Surfacing hidden value. http://www.brightplanet.com/.

  4. Google. Google news search. http://news.google.com/.

  5. Google. www.google.com/.

  6. Maria Halkidi, Benjamin Nguyen, Iraklis Varlamis, and Mihalis Vazirgianis. Thesus: Organising web document collections based on semantics and clustering. Technical Report, 2002.

    Google Scholar 

  7. T. Haveliwala. Efficient computation of pagerank. Technical report, Stanford University, 1999.

    Google Scholar 

  8. H. Garcia-Molina J. Cho. Synchronizing a database to improve freshness. SIGMOD, 2000.

    Google Scholar 

  9. R. Lafontaine. A delta format for XML: Identifying changes in XML and representing the changes in XML. In XML Europe, 2001.

    Google Scholar 

  10. A. Marian, S. Abiteboul, G. Cobena, and L. Mignet. Change-centric management of versions in an XML warehouse. VLDB, 2001.

    Google Scholar 

  11. L. Martin. Networked electronic publications policy, 1999. http://www.nlc-bnc.ca/9/2/p2-9905-07-f.html.

  12. J. Masanes. Pr server les contenus du web. In IVe journ es internationales d’tudes de l’ARSAG— La conservation l’ re du num rique, 2002.

    Google Scholar 

  13. J. Masan s. The BnF’s project for web archiving. In What’s next for Digital Deposit Libraries? ECDL Workshop, 2001. http://www.bnf.fr/pages/infopro/ecdl/france/sld001.htm.

  14. L. Mignet, M. Preda, S. Abiteboul, S. Ailleret, B. Amann, and A. Marian. Acquiring XML pages for a WebHouse. In proceedings of Base de Donn es Avanc esconference, 2000.

    Google Scholar 

  15. Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The pagerank citation ranking: Bringing order to the web, 1998.

    Google Scholar 

  16. S. Raghavan and H. Garcia-Molina. Crawling the hidden web. In The VLDB Journal, 2001.

    Google Scholar 

  17. L. Page S. Brin. The anatomy of a large-scale hypertextual web search engine. WWW7 Conference, Computer Networks 30(1–7), 1998.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Abiteboul, S., Cobéna, G., Masanes, J., Sedrati, G. (2002). A First Experience in Archiving the French Web. In: Agosti, M., Thanos, C. (eds) Research and Advanced Technology for Digital Libraries. ECDL 2002. Lecture Notes in Computer Science, vol 2458. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45747-X_1

Download citation

  • DOI: https://doi.org/10.1007/3-540-45747-X_1

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-44178-6

  • Online ISBN: 978-3-540-45747-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics