YOLOv5-Based Electric Scooter Crackdown Platform

Lee, Seung-Hyun; Oh, Sung-Hyun; Kim, Jeong-Gon

doi:10.3390/app15063112

Open AccessArticle

YOLOv5-Based Electric Scooter Crackdown Platform

by

Seung-Hyun Lee

,

Sung-Hyun Oh

and

Jeong-Gon Kim

^*

Department of Electronic Engineering, Tech University of Korea, Siheung-si 15297, Republic of Korea

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(6), 3112; https://doi.org/10.3390/app15063112

Submission received: 1 January 2025 / Revised: 4 March 2025 / Accepted: 6 March 2025 / Published: 13 March 2025

(This article belongs to the Special Issue Applied Artificial Intelligence and Data Science)

Download

Browse Figures

Versions Notes

Abstract

:

As the use of personal mobility (PM) devices continues to rise, regulatory violations have become more frequent, highlighting the need for technological solutions to ensure efficient enforcement. This study addresses these challenges by proposing an AI-based enforcement platform. The system integrates the You Only Look Once version 5 (YOLOv5) object detection model, a deep-learning-based framework, with Global Positioning System (GPS) location data, Raspberry Pi 5, and Amazon Web Services (AWS) for data processing and web-based implementation. The YOLOv5 model was deployed in two configurations: one for detecting electric scooter usage and another for identifying legal violations. The system utilized AWS Relational Database Service (RDS), Simple Storage Service (S3), and Elastic Compute Cloud (EC2) to store violation records and host web applications. The detection performance was evaluated using mean average precision (mAP) metrics. The electric scooter detection model achieved mAP50 and mAP50-95 scores of 99.5 and 99.457, respectively. Meanwhile, the legal violation detection model attained mAP50 and mAP50-95 scores of 99.5 and 81.813, indicating relatively lower accuracy for fine-grained violation detection. This study presents a practical technological platform for monitoring regulatory compliance and automating fine enforcement for shared electric scooters. Future improvements in object detection accuracy and real-time processing capabilities are expected to enhance the system’s overall reliability.

Keywords:

artificial intelligence (AI); object detection; personal mobility (PM); You Only Look Once version 5 (YOLOv5); Amazon Web Services (AWS)

1. Introduction

Electric scooters, a relatively new addition to personal mobility (PM) devices, have been steadily increasing in number each year due to their operational efficiency and convenience for short-distance travel. According to Business Research, the global PM market was valued at United States Dollars (USD) 1.33 billion in 2023 and is projected to reach USD 2.20 billion by 2032, with a compound annual growth rate (CAGR) of 5.8% during this period [1].

The usage of PM devices in South Korea has also surged, particularly in shared mobility services. The number of shared PM devices grew more than fourfold, from 70,000 in 2020 to 290,000 in 2023. According to data.ai, cumulative downloads of the top eight shared PM applications in the first half of 2023 reached approximately 18.68 million, marking a 36% increase compared to the same period in the previous year [2]. The widespread adoption of PM devices has led to positive impacts, such as improved traffic efficiency.

However, it has also introduced challenges, including safety concerns and traffic disruptions. A key issue is that many users fail to comply with traffic laws, prompting governments to strengthen regulations on PM devices, including electric scooters.

To address these concerns, South Korea revised the Road Traffic Act on 9 June 2020, with the updated regulations coming into effect on 10 December 2020. Under this amendment, PM riders are required to wear helmets and possess a valid driver’s license, reflecting a commitment to enhancing safety through legislation. In contrast, France classified PM devices as a new category of vehicle under Order No. 2019-1082 on the Regulation of Personal Mobility Devices, enacted on 23 October 2019. This classification was integrated into the Road Act to establish a clearer legal framework for PM usage [3,4].

Despite these regulatory efforts, the Traffic Accident Analysis System (TAAS) of the Korea Road Traffic Authority reported 2389 PM-related traffic accidents in 2023, resulting in 24 fatalities and 2622 injuries. This marks only a slight decrease from 2022, which recorded 2386 accidents, 26 deaths, and 2684 injuries. Notably, the fatality rate—defined as the number of deaths per 100 traffic accidents—remains alarmingly high at 5.6%, which is 4.3 times greater than the overall traffic accident fatality rate of 1.3% from the previous year.

Contrary to expectations following the 2020 Road Traffic Act revision, electric-scooter-related accidents have continued to rise, increasing from 447 cases in 2019 to 897 in 2020, 1735 in 2021, 2386 in 2022, and 2389 in 2023, while the fatality rate nearly doubled each year, highlighting the urgent need for more effective enforcement measures [5].

Despite the revision of the Road Traffic Act, the enforcement of personal mobility (PM) violations remains hindered by limited manpower and resources, making it challenging to consistently apply regulations across all regions. Common violations, such as failing to wear a helmet, riding with multiple passengers, operating without a license, and driving under the influence, persist. A major shortcoming of the current Road Traffic Act is the lack of adequate punitive measures for PM violations. Consequently, the number of violations continues to rise, while existing enforcement methods remain inefficient due to these constraints. This situation underscores the urgent need for a technology-driven system capable of effectively detecting and recording violations in real time.

This study proposes a smart enforcement platform designed to identify legal violations and automatically issue fines during the use of shared electric scooters. The key technical contributions of this study are as follows:

(1): High-accuracy violation detection, including failure to wear a helmet and riding with multiple passengers, achieved through YOLOv5 (You Only Look Once version 5) object detection technology [6,7,8,9].
(2): Real-time violation tracking, utilizing a user identification system integrated with the Global Positioning System (GPS) to record location and timestamp data [10,11].
(3): Cloud-based data management, where violation records are stored in an Amazon Web Services (AWS)-based relational database (DB) and made accessible for verification via a web interface [12,13,14,15].

This platform is designed to process violation data in real time, leveraging AI-based object detection to overcome the limitations of manpower-dependent enforcement and contribute to enhancing overall traffic safety.

The structure of this paper is as follows: Section 2 reviews related research on PM enforcement platforms, Section 3 provides an overview of the proposed system, Section 4 elaborates on the YOLOv5-based object detection model, Section 5 describes the GPS-based user classification system, Section 6 presents the experimental evaluation of the platform, including tests conducted using the AWS server database and an analysis of the results, and Section 7 discusses future research directions and concludes the paper.

2. Related Work

In this section, we examine research on artificial intelligence (AI)-based enforcement platforms in road traffic environments. AI techniques can generally be classified into supervised learning, unsupervised learning, and reinforcement learning. Here, we focus on supervised learning for object detection.

First, we review the literature on the performance of the YOLO model, a well-known supervised learning approach for object detection. In [16], the author proposed a vehicle detection method based on YOLOv5s. The CoordConv convolutional layer was integrated into the existing architecture to enhance spatial location awareness, and the Shuffle Attention mechanism was added to optimize the learning process. Furthermore, the Focal-EIOU loss function was introduced to improve bounding box prediction accuracy. This method achieved a mean average precision (mAP50-95) of 92.221%, which is a 1.747% improvement over the existing YOLOv5s. Nevertheless, ref. [16] has certain limitations, including collisions between multiple objects in complex road environments and constraints regarding real-time optimization speed. In [17], a sliding window cropping method was introduced to effectively detect objects in ultra-high-resolution images. This approach achieved a mAP of 61.4% by minimizing duplicate detections through the midline method. However, it faced challenges such as increased computational requirements when handling large and complex objects. In [18], an improved YOLOv5 model was proposed for real-time gesture recognition. This study investigated architectural modifications to YOLOv5 to enhance its efficiency in recognizing hand gestures in real-time scenarios. Key improvements included optimized feature extraction and a customized dataset tailored for gesture recognition tasks. Experimental results demonstrated that the enhanced YOLOv5 model achieved a mAP of 96.8% on a designated gesture dataset. This research highlights the adaptability of YOLOv5 beyond traditional object detection applications, extending its usability to real-time human–computer interaction. However, despite achieving high accuracy, real-time deployment in resource-constrained environments remains a challenge, necessitating further optimization for embedded systems.

Second, we analyze research related to GPS-based user classification systems. In [19], real-time location data were collected using the NEO-6M GPS module, which transmits the data to a server via Wi-Fi. Experimental results showed an average location accuracy of approximately 2 to 5 m, and low-cost hardware was used effectively for location tracking. In [20], a system was proposed that integrates GPS with an inertial measurement unit (IMU) for user location tracking. The authors demonstrated reliable performance in dense urban areas and environments subject to signal loss. By fusing data from GPS and IMU, the approach improved average position accuracy and maintained continuity in users’ movement paths. In [21], the authors studied a sensor fusion module for autonomous vehicles and proposed a method to improve navigation accuracy and environmental awareness by combining GPS data and IMU. This study pointed out the limitations of GPS-based navigation and explored a solution to the satellite signal blocking problem in urban areas. To this end, an IMU-based Dead Reckoning method was designed to enable accurate location estimation even in environments where GPS signals are intermittently lost by combining GPS data. The results of the study proved that the reliability of location estimation can be significantly improved by combining various sensor modules, which is an important basis for improving the GPS user tracking accuracy in this study.

Third, we discuss cloud-based data processing and web platforms. In [22], a scalable and stable web application hosting environment was configured using AWS Elastic Compute Cloud (EC2). This study showed the efficiency of real-time data management by enhancing both stability and processing speed in cloud-based systems. In [22], a method was proposed for designing a relational database using AWS Relational Database Service (RDS), with images and data securely stored in AWS Simple Storage Service (S3). This approach not only offered real-time data accessibility to users but also strengthened data security. In [22], a modern application development methodology was introduced that leverages AWS, including containers and serverless technologies. The study detailed how these tools can facilitate efficient, scalable applications, improving both data processing stability and the user experience. Based on theses findings, our proposed system processes scooter violation data in real time on an AWS-powered cloud platform and delivers relevant information to users through web applications on AWS EC2 for stability and scalability and manage the database via AWS RDS. To reinforce system reliability scalability, we securely store images and violation data in AWS S3. This implementation aligns with the methodologies discussed in [23], which explore cloud-based solutions for efficient real-time data processing. Additional details on this implementation are provided in Section 6.

3. Overall Proposed System Configuration

This subsection presents the overall architecture of the proposed system for cracking down on illegal electric scooter usage. The conceptual diagram is shown in Figure 1. As illustrated in the figure, the system is divided into three main components.

The first component is a real-time monitoring module, which captures current road conditions and identifies scooter riders. It continuously monitors the road environment and provides the necessary data for subsequent processing. The second component is an AI engine that trains a detection model using real-time data from the monitoring module. The trained model is then deployed in the actual road environment to optimize the detection of violations.

Finally, the third component transmits the violation detection results to a database and to law enforcement while also managing the imposition of fines through a webpage. Through this webpage, administrators—such as police officers—can view and manage real-time violation information. Additionally, by entering their credentials on the webpage, users can access details such as photos, timestamps, and fines associated with any recorded violations.

4. Implementation of Proposed System

4.1. Overview of YOLOv5

This study employs the YOLO model to enhance the detection of violations and regulatory infractions. YOLO offers a simplified architecture and high processing speed by utilizing a single neural network. Furthermore, because it processes the entire image context during object detection, it effectively reduces background-related errors. Within a single convolutional neural network (CNN) pass, YOLO simultaneously calculates the bounding box center coordinates (x, y), the bounding box dimensions (w,h), and the corresponding class probabilities.

In particular, YOLO partitions the input image into an S × S times S × S grid, where each cell predicts the objects present in its corresponding region. This approach segments the image into detection regions and computes the location and attributes of objects within each cell. Subsequently, each cell forecasts B bounding boxes and assigns a confidence score to each one, indicating the likelihood that the bounding box contains an object. In addition, each cell computes conditional class probabilities for C classes, enabling the model to classify detected objects with high accuracy across various categories.

After generating these predictions, two steps are applied to refine the final bounding boxes. First, bounding boxes with low confidence scores are discarded. Second, non-maximum suppression (NMS) is employed to remove overlapping bounding boxes, retaining only the most reliable detections. Unlike region-based convolutional neural network (R-CNN) methods—which divide an image into multiple regions for individual analysis—YOLO processes the image in a single pass, eliminating the need for region-based segmentation. This efficiency results in performance that is approximately six times faster than R-CNN, making YOLO particularly well suited for real-time object detection. Accordingly, we employed YOLOv5 to leverage these advantages.

YOLOv5 comprises four variants, s, m, l, and x, which differ primarily in network architecture. Figure 2 provides a performance comparison of these YOLOv5 models.

The YOLOv5s model offers a lightweight architecture compared to its counterparts, delivering robust object detection performance while maintaining a relatively high frame rate. These attributes make YOLOv5s well suited for real-time object detection and for applications running in resource-constrained environments. Additional advantages include its compact model size, rapid processing speed, and high scalability. Specifically, YOLOv5s requires less memory and fewer computational resources, and it achieves a high frames-per-second (FPS) rate, making it ideal for systems that demand real-time analysis. Moreover, its lightweight architecture allows for deployment in various embedded settings, including Internet of Things (IoT) devices and mobile platforms. In addition, efficient model training can be achieved even with a small amount of labeled data. As shown in Figure 2, the leftward arrow labeled ‘Faster’ indicates lower GPU latency, meaning that YOLOv5s operates with a higher frame rate. Similarly, the upward arrow labeled ‘Better’ reflects improved object detection accuracy, demonstrating that YOLOv5x achives superior performance at the cost of increased computational requirements.

4.2. Customized YOLOv5s Model

In this study, two custom image datasets were labeled via the Roboflow platform to facilitate the detection of electric scooter riders and the identification of regulatory violations. Through this process, we developed an object detection model that assesses compliance and enhances system accuracy.

4.2.1. RIDE Model

The first object detection model, referred to as the “RIDE” model, is designed to determine whether an electric scooter is being ridden. It differentiates between riders and non-riders through two classes: Ride (users actively riding scooters) and Non-Ride (pedestrians or other individuals not riding scooters). The RIDE model thus provides essential data for subsequent object detection and violation-detection processes by indicating whether a scooter is actively in use.

4.2.2. VIOLATION Model

The second object detection model, named “VIOLATION”, specifically targets images in the Ride category identified by the RIDE model to determine whether the scooter user is violating any regulations. Criteria such as helmet usage, scooter presence, and the number of riders inform the model’s assessment of compliance. The resulting violation categories are Helmet, Kickboard, Non-helmet, People, and Person. By identifying these categories, the VIOLATION model furnishes real-time data that form the basis for evaluating legal compliance.

4.2.3. Dataset Correction and Filtering Process

As shown in Figure 3, we acquired a dataset directly from a real-world environment to construct the RIDE model. The RIDE model detects scooter riders and saves the corresponding images. To implement this process, we modified the detect.py script to include an initial filtering phase, during which the bounding box of each detected object is used to crop the image. This step removes extraneous background information, resulting in images that focus specifically on the detected objects. By minimizing background noise, this approach enhances the efficiency of the subsequent VIOLATION model’s object detection.

As shown in Figure 4, we developed an object detection model trained on images cropped by the RIDE model. If only a scooter is detected in the cropped image—indicating no rider is present—we applied a secondary filtering process that discards such images. This additional filtering enhances the data fed into the VIOLATION model, thereby improving its accuracy in detecting infractions. The VIOLATION model then identifies violations within the scooter boarding area using data that have undergone both the initial and secondary filtering phases in the RIDE model. In this study, the identified violations are categorized into two primary types: riding without a helmet and carrying multiple passengers.

4.3. Training Performance of Customized YOLOv5s Model

In this study, three key metrics were employed—namely the confusion matrix, mAP, and F1 Score—to evaluate the performance of the proposed YOLOv5s model. The confusion matrix provides a clear depiction of the model’s performance by illustrating the relationship between predicted outcomes and actual labels in a matrix format. From this matrix, one can derive precision and recall, which are critical metrics in binary classification. Precision is defined as the ratio of true positives to the total number of predicted positive cases, whereas recall is defined as the ratio of true positives to the total number of actual positive cases. Equations (1) and (2) represent precision and recall, respectively.

P r e c i s i o n = \frac{T P}{T P + F P}

(1)

R e c a l l = \frac{T P}{T P + F N}

(2)

where a true positive (TP) is a case in which the model correctly predicts a positive outcome, a false negative (FN) occurs when the model predicts a negative outcome but the actual result is positive, a false positive (FP) arises when the model predicts a positive outcome but the actual result is negative, and a true negative (TN) refers to a case where the model accurately predicts a negative outcome.

4.3.1. Training Performance Analysis of RIDE Model

The RIDE model was configured with a batch size of 32 and trained for 250 epochs. Among a total of 1256 datasets, 877 were used for training, 252 for validation, and 127 for testing, resulting in a split ratio of approximately 7:2:1. The training process was then performed accordingly.

Figure 5 shows the confusion matrix for the RIDE model, which comprises three classes: Non-Ride, Ride, and Background. The matrix indicates that both Non-Ride and Ride classes demonstrate exceptionally high prediction accuracy. Notably, the Background class shows no detection errors, resulting in no recorded values for that class.

Table 1 presents the precision values for each RIDE model class, calculated from the confusion matrix. Precision for the RIDE model was computed as described in (1). As shown in the table, all classes achieve a precision of 1.0, indicating consistently accurate predictions and reflecting the high reliability and performance of the RIDE model.

The F1 Score, defined as the harmonic mean of precision and recall, is a key metric for evaluating overall model performance. A higher F1 Score signifies better model accuracy. The F1 Score is formulated as shown in (3).

F 1 S c o r e = 2 \times \frac{P r e c i s i o n \times R e c a l l}{P r e s i c i o n + R e c a l l}

(3)

where the F1 Score reflects the balance between precision and recall, with values ranging from 0.0 to 1.0. This metric is particularly useful for imbalanced datasets because it is designed to maintain an equilibrium between precision and recall, preventing significant disparities between the two. Consequently, it serves as a key metric for performance evaluation.

Figure 6 displays the F1 Score results of the RIDE model. As shown, when the F1 Score reaches a confidence value of 0.822, all classes achieve a performance score of 1. This finding suggests that setting the RIDE model’s confidence threshold to 0.822 can optimize detection performance in real-world scenarios.

In addition, mAP is calculated based on the area under the precision–recall (PR) curve, representing the average precision (AP). mAP50 considers a detection correct only when the intersection over union (IoU)—which measures how closely a predicted bounding box aligns with the actual object—is at least 0.5. By comparison, mAP50-95 evaluates IoU thresholds from 0.5 to 0.95 in increments, calculating the average across these thresholds for a more comprehensive assessment of the model’s performance. The RIDE model exhibits outstanding results, achieving mAP50 = 99.475 and mAP50-95 = 99.5, indicating consistently high accuracy across a range of IoU criteria.

4.3.2. Training Performance Analysis of VIOLATION Model

The VIOLATION model was configured with a batch size of 32 and trained for 250 epochs. The dataset consisted of 1689 images, including 1166 for training, 323 for validation, and 200 for testing, resulting in an approximate 7:2:1 split.

Figure 7 shows the confusion matrix for the VIOLATION model, which consists of five classes: Helmet, Kickboard, Non-helmet, People, and Person. The matrix demonstrates high prediction accuracy across all classes. Both precision and recall appear evenly distributed, indicating that the model produced accurate predictions with no misclassifications.

Table 2 presents the precision values for each class of the RIDE model, calculated from the confusion matrix based on (1). The table indicates that every class achieved a precision value of 1.0, signifying that the RIDE model consistently made accurate predictions and demonstrated high reliability and performance.

Figure 8 shows the F1 Score results of the VIOLATION model. As depicted, when the confidence value is set to 0.742, all classes achieve an F1 Score of 1. This finding suggests that configuring the VIOLATION model’s confidence value to 0.742 optimizes detection results in a real environment. The model’s mAP50 value is notably high at 99.5, whereas its mAP50-95 value is 81.813, indicating comparatively lower performance at higher IoU thresholds. This discrepancy implies that as the IoU threshold approaches 95, precision and recall may decline slightly in predicting an object’s location and size. While the VIOLATION model performs strongly at lower IoU levels, it would benefit from more precise location predictions under stricter high IoU criteria.

4.3.3. Training Results of Both Customized Models

The detection accuracy of the two customized models is presented in Table 3. Their performance was assessed using several metrics, including the confusion matrix, F1 Score, mAP50, and mAP50-95.

The RIDE model demonstrated exceptional performance, achieving a mAP50-95 value of 99.457 and near-perfect precision and recall across all classes. By contrast, the VIOLATION model yielded a comparatively lower mAP50-95 value of 81.813, although it maintained a mAP50 value of 99.5—similar to that of the RIDE model.

These findings suggest that the VIOLATION model requires improved accuracy in predicting object location and size at higher IoU thresholds. Nevertheless, both models show strong potential for real-time object detection and law violation detection applications, as corroborated by the results discussed in Section 6.

5. User Identification Method

This section outlines a user identification system designed to assess whether an electric scooter user is in violation of the law. The system is configured with a GPS module, a surveillance camera, and a Wi-Fi module. The surveillance camera is installed on the street of the Tech University of Korea Industry-University Convergence Agency, with the location’s latitude and longitude recorded as 37.33906 and 126.7348, respectively. By monitoring the position of the GPS-equipped electric scooter within the camera’s coverage area, the system determines whether the scooter remains within a user-defined error range. This error range is set to 0.00036 degrees—approximately 10 m—in consideration of the GPS module’s signal variability.

This system is implemented using the Arduino UNO R3 board and the ESP8266 Wi-Fi module as primary components. The Arduino UNO R3 is connected to a GPS module, which collects the user’s real-time location data and converts it into latitude and longitude coordinates. These coordinates are then compared against a predefined reference location. If the user’s location falls within the acceptable error range, it confirms that the user has arrived at the target destination.

In addition, the ESP8266 module verifies the user’s current location by detecting nearby Wi-Fi signals. It then transmits the user’s information to the Raspberry Pi 5, facilitating the effective reception and management of location data. Figure 9 illustrates how the Arduino UNO R3 connects to a Wi-Fi network using the ESP8266 module.

Figure 9. ESP8266-based Wi-Fi connect procedure, where (1) indicates the Wi-Fi connection readiness (Ready) and confirms that an IP address has been assigned. In (2), the Station mode is set to “OK”, preparing the device to connect to the network. Meanwhile, (3) illustrates the process of successfully attempting to join a network using the SSID and password. Finally, (4) shows that the Wi-Fi connection has been successfully established. Subsequently, Figure 10 displays the screen where the collected GPS data and username are printed on the serial monitor.

Figure 10. GPS signal reception result.

In the location information, LAT denotes latitude, LON denotes longitude, SAT indicates satellite connection status, and PREC represents GPS accuracy. These data are collected in real time through the GPS module connected to the Arduino UNO R3 board. The system identified the user as “Lee Seung Hyun”, confirming that it successfully gathered and processed the user’s information based on GPS data. During this procedure, the user’s name and GPS data are processed in tandem and displayed on the serial monitor.

Upon analyzing the GPS reception data shown on the serial monitor, we verified that the LAT and LON values fall within the specified target location. The SAT value ranges from 8 to 10; the closer it is to 10, the higher the accuracy. This status indicates a reliable level of location information for real-time tracking. The PREC value ranges from 87 to 155, with lower numbers corresponding to higher location accuracy. Within this range, the precision is sufficiently reliable for the real-time location tracking system.

Subsequently, the collected data are sent to the Raspberry Pi 5 via the ESP8266 Wi-Fi module, facilitating integrated management of the user’s location data in the central system. This process underscores the system’s dependability in performing real-time location tracking and effective user identification.

6. Experimental Results

This section presents the experimental results of the proposed system, which was evaluated through real-world testing using a Raspberry Pi 5, a USB night vision CCTV camera, and a GPS module to detect electric scooter violations. The system was trained on a labeled dataset and deployed in a real-time environment to assess detection accuracy, processing speed, and integration with a web-based enforcement platform.

6.1. Experimental Setup

6.1.1. Hardware

The system was implemented using a Raspberry Pi 5 (Raspberry Pi Foundation, Cambridge, UK) equipped with 4 GB RAM and a Quad-core Cortex-A72 processor. A Link+ USB Night Vision CCTV Camera (Shenzhen Link+Technology Co., Ltd., Shenzhen, China) was utilized for image acquisition, featuring built-in infrared LEDs for enhanced night vision capabilities, ensuring reliable object detection under low-light conditions. To facilitate location tracking, the system employed a u-blox NEO-6M GPS module (u-blox AG, Thalwil, Switzerland), which provides an average accuracy of 2 to 5 m, allowing precise recording of violation locations.

6.1.2. Software

The YOLOv5s model was trained on Google Colab using an NVIDIA Tesla T4 GPU to enable efficient deep learning model training. The detection pipeline was developed using Python 3.8.20 and OpenCV 4.5.5 to support real-time image processing. Cloud storage and database management were handled using AWS services, including S3 for image storage, RDS for structured data management, and EC2 for hosting the web-based enforcement platform. The web application was designed to provide real-time access to violation records and facilitate fine collection seamlessly.

6.2. Experimental Procedure

The system evaluation was conducted in four key phases. Initially, a dataset comprising images for RIDE detection and violation detection was collected. The images were labeled using the Roboflow platform, followed by a preprocessing step to remove low-quality or redundant images, thereby enhancing training efficiency.

Subsequently, the YOLOv5s models underwent training and validation. The models were trained for multiple epochs with a predefined batch size to optimize detection performance. The dataset was divided into training, validation, and testing subsets to ensure a robust performance assessment. Standard object detection metrics such as precision, recall, and mean average precision (mAP) were utilized to evaluate model accuracy.

After training, the models were deployed on a Raspberry Pi 5 for real-time detection testing. Object detection was assessed under real-world conditions, specifically on university campus roads. The system processed images at an average rate of one frame per second, demonstrating its capability to function effectively in real-time scenarios.

Finally, the enforcement system was validated through web-based application testing. The detected violations were logged and securely stored in AWS RDS and S3. A web interface was developed to enable users to review violation records, verify fines, and process payments. GPS accuracy was analyzed to ensure precise location tracking of detected violations.

6.3. RIDE Model Detection Performance

The detection results of the RIDE model are illustrated in Figure 11. The model effectively distinguished between riders and non-riders, ensuring high detection accuracy.

6.4. VIOLATION Model Detection Performance

The detection accuracy of the VIOLATION model is presented in Figure 12. The model successfully identified various violation categories, including helmet usage and multiple passengers.

Different colors in the bounding boxes represent various types of violations: a red box represents non-compliance with helmet usage for safety, a yellow box represents a single rider, a pink box represents a kickboard, a dark orange box represents the absence of a helmet, and a light orange box represents instances of two-person riding, which violate safety regulations.

6.5. Violation Log Data

The detected violations were recorded in a structured format and stored in a database. Figure 13 illustrates the log data of detected violations, including timestamps, violation types, and corresponding fine amounts. This log ensures traceability and supports automated enforcement mechanisms.

6.6. Web-Based Violation Management System

A violation tracking and fine collection webpage was developed to facilitate user interactions. Figure 14 showcases the web interface, which enables users to access violation records, review supporting evidence, and process fine payments. To enhance clarity in violation classification, different bounding box colors are used: red for helmet usage, dark orange for helmet absence, light orange for two-person riding, yellow for a single rider, and pink for a kickboard.

The system integrates AWS EC2, RDS, and S3, ensuring secure and scalable data management. These experimental results confirm that the proposed system effectively detects electric scooter violations and provides a reliable AI-driven enforcement solution.

6.7. Hardware Performance Comparison

To enhance real-time processing performance, a comparative analysis of Raspberry Pi 5 and Jetson Nano was conducted. The primary objective was to evaluate object detection speed and computational efficiency, assessing the feasibility of upgrading Jetson Nano as the main control unit (MCU) in future implementations.

Table 4 presents a performance comparison between Raspberry Pi 5 and Jetson Nano when executing the YOLOv5s model.

The results indicate that Jetson Nano significantly outperforms Raspberry Pi 5 in terms of inference speed, achieving approximately 3.5× higher FPS** due to GPU acceleration. While detection accuracy (mAP50-95) remains comparable between the two devices, Jetson Nano’s superior computational efficiency makes it more suitable for real-time applications. However, it is worth noting that Jetson Nano consumes approximately twice the power (~10 W) compared to Raspberry Pi 5 (~5 W), which is an important consideration for battery-powered deployments. To further improve system performance, future research will focus on optimizing the AI model for Jetson Nano deployment while balancing power consumption and detection efficiency.

7. Conclusions

This paper proposes an AI-based platform for enforcing regulations on shared electric kickboards, aiming to promote law-abiding behavior and enhance enforcement efficiency. The platform is designed to improve user convenience by facilitating violation inquiries and fine payments. In this study, real-time detection of violations was implemented using the YOLOv5s model, and the functionalities for violation confirmation and fine processing were successfully integrated into an AWS-based web platform. Additionally, both the RIDE model and the VIOLATION model, customized for this research, demonstrated excellent detection performance.

The implemented system processed images using Raspberry Pi 5 as the main control unit (MCU), achieving an average processing time of approximately one second. This processing speed highlights the need for further optimization to enhance real-time performance. While the GPS module provided stable signal reception within a 10 m range, it exhibited occasional inconsistencies, failing to receive signals in one out of every ten tests. Such signal instability can lead to transmission delays and reduced localization accuracy, presenting a potential limitation that may affect system reliability.

In future research, we plan to optimize the AI model to reduce object detection time and upgrade the MCU to Jetson Nano to improve real-time processing performance. Additionally, we intend to replace the current GPS module with a GPS + IMU system to enhance localization accuracy and overall tracking reliability. Furthermore, we aim to implement advanced signal filtering techniques and explore more stable hardware solutions. Finally, we will expand the dataset by collecting diverse environmental data to ensure the model’s adaptability and stable performance in complex urban settings.

Author Contributions

Conceptualization, S.-H.O. and S.-H.L.; methodology, S.-H.O.; software, S.-H.L.; validation, S.-H.O., S.-H.L. and J.-G.K.; formal analysis, S.-H.O. and J.-G.K.; investigation, S.-H.O.; resources, J.-G.K.; data curation, S.-H.O.; writing—original draft preparation, S.-H.L.; writing—review and editing, S.-H.O. and J.-G.K.; visualization, S.-H.O.; supervision, J.-G.K.; project administration, J.-G.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Business Research Insights. Electric Rideable Market; Business Research Insights: Maharashtra, India, 2024; Available online: https://www.businessresearchinsights.com/ko/market-reports/electric-rideable-market-104031 (accessed on 19 November 2024).
Electimes. The Title of the Cited Article. Electimes 2024. Available online: https://www.electimes.com/news/articleView.html?idxno=338871 (accessed on 12 November 2024).
Kim, E.H. Etûde de la récente législation française sur Engin de déplacement personnel. J. Comp. Law 2021, 21, 1–35. [Google Scholar] [CrossRef]
Kim, J.H.; Lee, S.Y. A Study on the Development of Personal Mobility Regulations. J. Transp. Stud. 2024, 15, 123–135. [Google Scholar]
National Police Agency of Korea, Personal Mobility (PM) Traffic Accidents Statistics by Year and Region. Available online: https://www.police.go.kr/component/resrce/file/ND_resrceFileDownload.do?resrceSn=595&resrceVer=1&fileSn=1 (accessed on 31 December 2024).
Zhang, C.; Xiong, A.; Luo, X.; Zhou, C.; Liang, J. Electric Bicycle Detection Based on Improved YOLOv5. In Proceedings of the International Conference on Advances in Computer Technology, Information Science and Communications(CTISC), Suzhou, China, 22–24 April 2022. [Google Scholar]
Bai, T.; Shi, F.; Wang, Z.; Liu, Y. Vehicle Target Detection in Aerial Images Based on Improved YOLOv5. In Proceedings of the International Conference on Intelligent Human-Machine Systems and Cybernetics(IHMSC), Hangzhou, China, 26–27 August 2023. [Google Scholar]
Liu, S.; Sha, Y.; Yang, Y.; Guan, H.; Wu, Y.; Li, J. Identification of Construction Vehicles under High Voltage Transmission Line Based on Improved YOLOv5s. In Proceedings of the International Conference on Power and Energy Systems (ICPES), Guangzhou, China, 23–25 December 2022. [Google Scholar]
Yong, H.; Mengqi, G.; Yongchuan, Z.; Xuelai, G. NTS-YOLO: A Nocturnal Traffic Sign Detection Method Based on Improved YOLOv5. Appl. Sci. 2025, 15, 1578. [Google Scholar] [CrossRef]
Revanth, C.V.S.; Alekhya, B.; Abbas, M.H. Arduino-Based Wheelchair Fall Detection System Using GPS and GSM Module. In Proceedings of the International Conference on Sustainable Computing and Smart Systems (ICS3S), Coimbatore, India, 14–16 June 2023. [Google Scholar]
Ramesh, G.; Sivaraman, K.; Subramani, V.; Vignesh, P.Y.; Bhogachari, S.V.V. Farm Animal Location Tracking System Using Arduino and GPS Module. In Proceedings of the International Conference on Computer Communication and Informatics (ICCCI), Coimbatore, India, 27–29 January 2021. [Google Scholar]
Mohamed, A.; Gunasegaran, G.; Herath, D. Cloud-based Weather Condition Monitoring System using ESP8266 and Amazon Web Services. In Proceedings of the International Conference on Information Technology Research (ICITR), Colombo, Sri Lanka, 7–8 December 2023. [Google Scholar]
Dasaraju, S.P.V.; Dwaraka, D.S.; Ravella, A.; Bamsidhar, E. Face Recognition using CNN and Amazon S3: An Expandable and Safe Integration for Various Uses. In Proceedings of the 2023 2nd International Conference on Automation, Computing, and Renewable Systems (ICACRS), Pudukottai, India, 11–13 December 2023. [Google Scholar]
Kokkinos, P.; Varvarigou, T.A.; Kretsis, A.; Soumplis, P.; Varvarigos, E.A. Cost and Utilization Optimization of Amazon EC2 Instances. In Proceedings of the International Conference on Cloud Computing, Santa Clara, CA, USA, 28 June–3 July 2013. [Google Scholar]
Dineva, K.; Atanasova, T. Health Status Classification for Cows Using Machine Learning and Data Management on AWS Cloud. Animals 2025, 13, 3254. [Google Scholar] [CrossRef] [PubMed]
Dong, Z. Vehicle Target Detection Using the Improved YOLOv5s Algorithm. Electronics 2024, 13, 4672. [Google Scholar] [CrossRef]
Wang, C.; Feng, W.; Liu, B.; Yang, X.; Yang, Y. Exploiting the Potential of Overlapping Cropping for Real-World Pedestrian and Vehicle Detection with Gigapixel-Level Images. Appl. Sci. 2023, 13, 3637. [Google Scholar] [CrossRef]
Biswas, S.; Nandy, A.; Naskar, A.K.; Saw, R. Real-Time Gesture Recognition using Improved YOLOv5 Model. In Proceedings of the 2024 11th International Conference on Signal Processing and Integrated Networks (SPIN), Nodia, India, 21–22 March 2024. [Google Scholar]
Kanani, P.; Padole, M. Real-time Location Tracker for Critical Health Patient Using Arduino, GPS Neo6m and GSM Sim800L in Health Care. In Proceedings of the International Conference on Intelligent Computing and Control Systems (ICICCS), Madurai, India, 13–15 May 2020. [Google Scholar]
Xu, T. Research on Geomagnetic Data Fusion and Prediction Project Based on GPS and IMU and Complementary Filtering Algorithm. In Proceedings of the 2024 IEEE 2nd International Conference on Image Processing and Computer Applications (ICIPCA), Suncheon, China, 28–30 June 2024. [Google Scholar]
Raveena, C.S.; Saravanan, R.S.; Kumar, R.V.; Chavan, A. Sensor Fusion Module Using IMU and GPS Sensors For Autonomous Car. In Proceedings of the 2020 IEEE International Conference for Innovation in Technology (INOCON), Bangluru, India, 6–8 November 2021. [Google Scholar]
Kubiak, D.; Zabierowski, W. A Comparative Analysis of the Performance of Implementing a Java Application Based on the Microservices Architecture, for Various AWS EC2 Instances. In Proceedings of the 2021 IEEE XVIIth International Conference on the Perspective Technologies and Methods in MEMS Design (MEMSTECH), Polyana, Ukraine, 12–16 May 2021. [Google Scholar]
Paulapandian, R.C.; Sankar, S.; Viswanathan, K.; Kumar, S.A. Application Modernization Strategies for AWS Cloud. In Proceedings of the 2022 International Conference on Computing, Communication and Signal Processing (ICCCSP), Chennai, India, 9–10 December 2022. [Google Scholar]

Figure 1. Overall system configuration.

Figure 2. Performance comparison between each YOLOv5 model.

Figure 3. Dataset labeling (RIDE model).

Figure 4. Dataset labeling (VIOLATION model).

Figure 5. Confusion matrix of RIDE model.

Figure 6. F1–confidence curve of RIDE model.

Figure 7. Confusion matrix of VIOLATION model.

Figure 8. F1–confidence curve of VIOLATION model.

Figure 11. Detection result of RIDE model.

Figure 12. Detection result of VIOLATION model.

Figure 13. Log data of detected violation driving.

Figure 14. Information and payment webpage of legal violations.

Table 1. Precision for each class of RIDE model.

Class	Precision
Non-Ride	1.0
Ride	1.0

Table 2. Precision for each class of VIOLATION model.

Class	Precision
Helmet	1.0
Kickboard	1.0
Non-helmet	1.0
People	1.0
Person	1.0

Table 3. Training results of each customized model.

Model	mAP50-95	mAP50	Precision	Recall
RIDE	99.457	99.5	0.9996	1.0
VIOLATION	81.813	99.5	0.9985	0.9994

Table 4. Performance comparison of Edge AI Devices for YOLOv5s object detection.

Device	Raspberry Pi 5	Jetson Nano
CPU/GPU	Quad-core Cortex-A76 (No GPU)	Quad-core Cortex-A57 + Maxwell GPU
Inference Time (ms per frame)	900–1000	250–300
FPS	~1.0	~3.5
mAP50-95	81.8	82.5
Power Consumption	~5 W	~10 W

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lee, S.-H.; Oh, S.-H.; Kim, J.-G. YOLOv5-Based Electric Scooter Crackdown Platform. Appl. Sci. 2025, 15, 3112. https://doi.org/10.3390/app15063112

AMA Style

Lee S-H, Oh S-H, Kim J-G. YOLOv5-Based Electric Scooter Crackdown Platform. Applied Sciences. 2025; 15(6):3112. https://doi.org/10.3390/app15063112

Chicago/Turabian Style

Lee, Seung-Hyun, Sung-Hyun Oh, and Jeong-Gon Kim. 2025. "YOLOv5-Based Electric Scooter Crackdown Platform" Applied Sciences 15, no. 6: 3112. https://doi.org/10.3390/app15063112

APA Style

Lee, S.-H., Oh, S.-H., & Kim, J.-G. (2025). YOLOv5-Based Electric Scooter Crackdown Platform. Applied Sciences, 15(6), 3112. https://doi.org/10.3390/app15063112

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

YOLOv5-Based Electric Scooter Crackdown Platform

Abstract

1. Introduction

2. Related Work

3. Overall Proposed System Configuration

4. Implementation of Proposed System

4.1. Overview of YOLOv5

4.2. Customized YOLOv5s Model

4.2.1. RIDE Model

4.2.2. VIOLATION Model

4.2.3. Dataset Correction and Filtering Process

4.3. Training Performance of Customized YOLOv5s Model

4.3.1. Training Performance Analysis of RIDE Model

4.3.2. Training Performance Analysis of VIOLATION Model

4.3.3. Training Results of Both Customized Models

5. User Identification Method

6. Experimental Results

6.1. Experimental Setup

6.1.1. Hardware

6.1.2. Software

6.2. Experimental Procedure

6.3. RIDE Model Detection Performance

6.4. VIOLATION Model Detection Performance

6.5. Violation Log Data

6.6. Web-Based Violation Management System

6.7. Hardware Performance Comparison

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI