Skip to content

xilapa/SiteWatcher

Repository files navigation

SiteWatcher

Get notified when a specific change (or not) occurs at a website.

Project Status: WIP – Initial development is in progress, but there has not yet been a stable, usable release suitable for the public. Badge
Backend: Backend Quality Gate Status Lines of Code Coverage Reliability Rating Vulnerabilities
Frontend: Lines of Code Reliability Rating Vulnerabilities

MotivationSummaryFeaturesTechnologiesBackend architectureNext steps

Motivation

How often do you run out of water when the water treatment station is under maintenance? And you simply didn't know about that?

In my region the water company doesn't have an email alert service to notify about maintenances or any other interruption in water supply.

In this context, I had the first ideia to build an application that'll check when a website mention some words and send an alert email. The ideia growth a little and this project was born.

Summary

This application monitors website changes and notifies you by email. It's a web crawler that sends notifications.

You can choose when to be notified:

  • When any change occurs at the website;
  • When a specific word or phrase is mentioned on the page;
  • When a custom regex has a new match on the website HTML, you can choose to be notified if an existing regex match disappear;

You can choose the monitoring rate between 2 and 24 hours;

Features

  • Google login: don't need to create an account to forget the password;
  • Secure: we use OAuth Code Flow with PKCE to protect the authentication flow, see PR 115;
  • Intelligent search: the search ignores accents and capital letters, and sorts the results by the most relevant to the search term;
  • We don't spam your email box: all alerts are grouped by the monitoring rate and you only receive one email about all alerts that have been triggered with that rate. Also, SiteWatcher uses outbox pattern and idempotent consumers to send the alert email without the chance to duplicate the email.
  • You have the control: you can disconnect all your devices from SiteWatcher with one click.

Technologies

This project is currently using:

Angular 13 .NET 7 EF Core 7 Swagger PostgreSQL Dapper
FluentValidation MediatR Redis BenchmarkDotNet Roslynator StronglyTypedId
MailKit xUnit Polly FluentAssertions Moq ReflectionMagic
HashIds Testcontainers Scrutor Masstransit Fluid AngleSharp

Removed

AutoMapper Reason

Backend architecture

The WebAPI backend is an onion layered architecture tending to the an hexagonal architecture. Also, one of the next steps, is to have a complete hexagonal architecture, moving all business rules, caching and validations inside the application layer.

The actual architecture is pragmatic making use of some DDD concepts like aggregates being responsible for the business logic and dealing with their entities and value objects. Bounded contexts are not implemented, because the application domain is not complex enough to justify the use of bounded contexts.

Bellow is the domain representation. The aggregates are represented by the bigger orange box, the aggregate root is in red, the entities are in blue and the value objects are green.

The cardinality is read top-down, e.g., the Notification aggregate has many NotificationAlerts and each NotificationAlerts has only one AlertId. It's good to notice from the domain representation, that the NotificationAlerts entity is part of an N-N relationship between Alert and Notification aggregates. An alert can have many notifications and a notification can be related to many alerts.

In some parts the design is based on the Jason Taylor Clean Architecture template, his approach using MediatR to send Domain Events is very clean and well done. Some domain events are "Messages", these type of events are sent to a queue and processed in background by the worker. Those messages are related to:

  • Email sending;
  • User email confirmation;
  • User reactivation;

To be able to "watch" the user-defined websites periodically, Sitewatcher worker crawls websites and sends notification emails. Here's how it works:

  1. It reads the database from time to time, matching the possible frequencies available to get the alerts;
  2. Try crawling the site using a retry policy to avoid transient errors;
  3. If an alert is triggered or any site cannot be reached, an AlertTriggeredEvent is published on queue with all alerts triggered for an user;
  4. The Worker consumes the AlertTriggeredEvent an process it creating a Notification, the notification is responsible to create the email that is published on the queue;
  5. The Worker, also consumes the email queue and sends the email to the user;

Next steps

  • Remove dead code and unnecessary abstractions;
  • Improve the intelligent search using an algorithm like the Levenshtein distance, removing some search business rules from the database;
  • Implementing a full hexagonal architecture;
  • Increase test coverage to 80% at least;
  • Remove dependencies that make heavy use of reflection or have a high memory usage;
  • Move background email sendings from WebAPI to the worker using RabbitMQ;
  • Implement a "can crawl the site" validation (some sites block web crawlers) with a response sent by WebSockets using SignalR;
  • Move the email sending to a microservice written in golang;
  • Implement notifications by Telegram;