GitHub - anudeepadi/Monarch-Butterfly-Population-Decline-Prediction: Using Machine Learning and Deep Learning models to study population trends and factors affecting monarch butterflies

Monarch Butterfly Population Decline Prediction

This summary outlines the key details, approach, and findings of the project aimed at predicting monarch butterfly sightings and analyzing the contributing factors to their population decline. This project was part of the 2024 Rowdy Datathon, and the focus was to use machine learning and deep learning models to study population trends and factors affecting monarch butterflies.

1. Problem Statement

The project aims to respond to a hypothetical global crisis involving a drastic decline in monarch butterfly populations. Monarch butterflies are crucial pollinators, and their decline has far-reaching implications for biodiversity, agriculture, and human well-being. The task was to:

Analyze fluctuations in monarch butterfly populations.
Investigate contributing factors, such as climate change and pesticide use.
Propose resource allocation strategies to mitigate population decline.
Perform a focused analysis of migration patterns, particularly through Texas.

This project analyzes the Monarch Butterfly Population Decline and aims to provide actionable insights into the causes and potential remedies.

2. Data Description

The data sources included:

Monarch butterfly migration data from Journey North.
Environmental and air quality data from EPA AQS.
Pesticide data from the USDA Pesticide Data Program.
Various additional datasets related to climate, geography, and crop production.

The dataset faced issues with missing values, particularly in the Year, Month, Day, and Latitude columns, which were addressed through data imputation and cleaning techniques.

3. Challenges Faced

3.1. Missing Data

Year, Month, Day, Latitude: A significant portion of the data contained missing values.
- Solution: Missing values were either imputed using the median or dropped to maintain data integrity.

3.2. Overfitting and Model Complexity

Initial models showed extremely low loss values, indicating overfitting.
- Solution: Model complexity was reduced, and regularization techniques such as dropout and L2 regularization were applied.
- Early stopping was also introduced to prevent the model from overtraining.

3.3. Numerical Instabilities

During training, the model produced NaN losses due to issues in data preprocessing and model configuration.
- Solution: The learning rate was reduced, and gradient clipping was applied to stabilize the training process.

4. Modeling Process

Step 1: Preprocessing

Date Handling: The Date column was split into Year, Month, and Day components.
Missing Value Handling: Missing values in the Year, Month, Day, and Latitude fields were imputed with the median values or dropped if necessary.
Normalization: All features and the target variable were scaled using MinMaxScaler for improved model performance.

Step 2: Model Selection

A hybrid deep learning model combining LSTM (for temporal data) and CNN (for spatial data) was used to capture both time-based and location-based dependencies.
LSTM was used for Year, Month, and Day features to capture sequential dependencies.
CNN was used for Latitude and Longitude to capture spatial relationships.

Step 3: Regularization and Optimization

Dropout was applied to prevent overfitting.
L2 regularization was added to dense layers to control large weight values.
Batch Normalization was introduced to stabilize the training process.
The Adam optimizer was used with a learning rate of 0.00001 and gradient clipping to prevent exploding gradients.

Step 4: Training and Validation

The model was trained over 50 epochs with early stopping applied based on the validation loss.
Both training and validation loss were monitored to ensure that the model generalized well and did not overfit.

5. Key Findings

Population Decline: There are clear signs of a significant decline in monarch butterfly populations, particularly correlated with pesticide usage and air quality metrics.
Temporal Analysis: The LSTM model was able to capture seasonal patterns in butterfly sightings, highlighting critical migration periods.
Geospatial Analysis: The CNN model successfully identified critical regions for monarch butterfly populations, particularly in Texas and other states.

6. Next Steps

Further Feature Engineering: Investigating additional features such as temperature, land use, and rainfall patterns to improve the model’s accuracy.
Fourier Transform for Temporal Data: Investigating the use of Fourier Transforms to capture cyclical patterns in the temporal data (e.g., migration patterns over seasons).
Model Expansion: Extending the model to include other pollinators and investigating broader ecological impacts.

7. Results

The model provided accurate predictions of monarch butterfly sightings and offered insights into the major contributing factors to their population decline, such as pesticide use and air quality changes. The analysis also identified critical regions and migration periods that need urgent attention for conservation efforts.

8. GitHub Repository

All source code, data cleaning scripts, and modeling approaches are available in the GitHub repository.
Follow the instructions in the repository's README file to replicate the results and understand the methodology.

GitHub Repository Link: [Add link to your repository]

9. References

Rowdy Datathon 2024 Data Challenge PDF (2024DataChallenge.pdf).
DevPost Workshop Guide on project submissions and presentations (DevPost Workshop).

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
MA_SO2_20230101_20231231.json		MA_SO2_20230101_20231231.json
README.md		README.md
RowdyDataPBI.pbix		RowdyDataPBI.pbix
dashboard_new.py		dashboard_new.py
dashboard_new_new.py		dashboard_new_new.py
main.py		main.py
merge_datasets.py		merge_datasets.py
merged_monarch_sightings_sorted.csv		merged_monarch_sightings_sorted.csv
merged_sightings_texas_pollution.csv		merged_sightings_texas_pollution.csv
output_file.geojson		output_file.geojson
sightings.py		sightings.py
statistical_testing.py		statistical_testing.py
time_series.py		time_series.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Monarch Butterfly Population Decline Prediction

1. Problem Statement

2. Data Description

3. Challenges Faced

3.1. Missing Data

3.2. Overfitting and Model Complexity

3.3. Numerical Instabilities

4. Modeling Process

Step 1: Preprocessing

Step 2: Model Selection

Step 3: Regularization and Optimization

Step 4: Training and Validation

5. Key Findings

6. Next Steps

7. Results

8. GitHub Repository

9. References

About

Releases

Packages

Contributors 2

Languages

anudeepadi/Monarch-Butterfly-Population-Decline-Prediction

Folders and files

Latest commit

History

Repository files navigation

Monarch Butterfly Population Decline Prediction

1. Problem Statement

2. Data Description

3. Challenges Faced

3.1. Missing Data

3.2. Overfitting and Model Complexity

3.3. Numerical Instabilities

4. Modeling Process

Step 1: Preprocessing

Step 2: Model Selection

Step 3: Regularization and Optimization

Step 4: Training and Validation

5. Key Findings

6. Next Steps

7. Results

8. GitHub Repository

9. References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages