Information Retrieval System for CACM Corpus

Overview

This project implements a full-featured information retrieval system that indexes and enables searching through the CACM (Communications of the ACM) corpus. I developed this as part of my CS3308 Information Retrieval course assignment, and expanded it to include a modern web-based user interface.

The system combines a Python backend for indexing with a responsive web frontend for searching and visualizing results.

Live Demo

Available online at:

Features

Backend (Python)

Document indexing with SQLite database storage
TF-IDF scoring for relevance ranking
Cosine similarity-based document retrieval
Support for the CACM corpus (570 computer science abstracts)
Stopword filtering and term processing

Frontend (HTML/CSS/JavaScript)

Real-time document loading and indexing with progress visualization
Advanced search options (match type, minimum score, date range)
Interactive visualizations of search results (term frequency, document relevance)
Document viewer with highlighted search terms
Similar document suggestions

Project Structure

📦 CS-3308-Information-Retrieval/
├── index.html          # Main web interface
├── styles.css          # Styling for the UI
├── script.js           # Frontend logic and search functionality
├── PythonProjects/
│   ├── indexer_main.py     # Indexer for CACM corpus
│   ├── search_engine.py    # Backend search functionality
│   └── indexer_part2.db    # SQLite database with indexed data
└── CACM_Corpus/
   └── cacm/
       ├── CACM-0001.HTML
       ├── CACM-0002.HTML
       └── ... (570 documents)

Usage

Web Interface

Open index.html in a web browser
The system will automatically fetch and index documents from the CACM corpus
Enter search queries in the search box
Use advanced options to refine searches
View document content by clicking on search results

Python Backend

If you want to use the Python components directly:

cd PythonProjects
python indexer_main.py  # To build the index
python search_engine.py  # To run search queries

Technologies Used

Web Interface

HTML5, CSS3, JavaScript (ES6+)
Chart.js for data visualization
Python 3.x for backend processing
SQLite for data storage
TF-IDF and Vector Space Model for information retrieval

Development Notes

The initial assignment required building the indexer and search engine in Python. I expanded upon this by developing a complete web interface that could work independently or integrate with the Python backend through an API-like approach. The frontend uses modern JavaScript techniques to fetch documents directly from the GitHub repository, process them in real-time, and provide an interactive search experience.

Future Improvements

Implement query expansion and spelling correction
Add support for additional document formats and collections

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
CACM_Corpus/cacm		CACM_Corpus/cacm
PythonProjects		PythonProjects
.example.txt.swp		.example.txt.swp
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
example.exe		example.exe
example.txt		example.txt
index.html		index.html
indexer_part2.db		indexer_part2.db
script.js		script.js
styles.css		styles.css

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Information Retrieval System for CACM Corpus

Overview

Live Demo

Features

Backend (Python)

Frontend (HTML/CSS/JavaScript)

Project Structure

Usage

Web Interface

Python Backend

Technologies Used

Web Interface

Development Notes

Future Improvements

License

About

Releases

Packages

Languages

License

ianmaloba/CS-3308-Information-Retrieval

Folders and files

Latest commit

History

Repository files navigation

Information Retrieval System for CACM Corpus

Overview

Live Demo

Features

Backend (Python)

Frontend (HTML/CSS/JavaScript)

Project Structure

Usage

Web Interface

Python Backend

Technologies Used

Web Interface

Development Notes

Future Improvements

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages