





















































Hi ,
Welcome to a brand new issue of PythonPro!
In today’sExpert Insight we bring you an excerpt from the recently published book, Modern Time Series Forecasting with Python - Second Edition, which explains the shift from traditional, isolated time series models to global forecasting models, which leverage related datasets to enhance scalability, accuracy, and reduce overfitting in large-scale applications.
News Highlights: Python has overtaken JavaScript on GitHub, driven by its role in AI and data science, per GitHub's Octoverse 2024 report; and IBM’s Deep Search team has released Docling v2, a Python library for document extraction with models on Hugging Face.
And, today’s Featured Study, introducesSafePyScript, a machine-learning-based tool developed by researchers at the University of Passau, Germany, for detecting vulnerabilities in Python code.
Stay awesome!
Divya Anne Selvaraj
Editor-in-Chief
sounddevice
and matplotlib
modules to create a real-time guitar tuner, where a live spectrogram identifies key bass guitar note frequencies for tuning, with a custom interface..sqrt()
function from the math module, explaining its use for calculating square roots of positive numbers and zero, while raising errors for negative inputs....
) to declare unimplemented methods in Python’s abstract classes can lead to hidden errors, and advocates for raise NotImplementedError
instead.In "SafePyScript: A Web-Based Solution for Machine Learning-Driven Vulnerability Detection in Python," Farasat et al., researchers from the University of Passau,Germany, introduce SafePyScript, a machine-learning-based web tool designed to detect vulnerabilities in Python code.
In software development, identifying vulnerabilities is a major concern due to the security risks posed by cyberattacks. Vulnerabilities, or flaws in code that can be exploited by attackers, require constant detection and correction. Traditionally, vulnerability detection relies on:
>Static Analysis: This rule-based approach scans code for known vulnerability patterns but often results in high false positives.
>Dynamic Analysis (Penetration Testing): This approach tests code in a runtime environment, relying on security experts to simulate potential attacks, making it resource-heavy and dependent on professional expertise.
Machine learning offers a data-driven alternative, enabling automated vulnerability detection with improved accuracy. Despite its popularity, Python lacks dedicated machine-learning-based tools for this purpose, which SafePyScript aims to provide. SafePyScript leverages a specific machine learning model, BiLSTM (Bidirectional Long Short-Term Memory), and the ChatGPT API to not only detect but also propose secure code, addressing this gap for Python developers.
SafePyScript is most useful for Python developers and software engineers who need an efficient way to detect vulnerabilities in their code without relying on traditional, labour-intensive methods. Its machine-learning foundation and integration with ChatGPT make it highly practical for real-world application, providing not only insights into code vulnerabilities but also generating secure code alternatives.
SafePyScript’s effectiveness rests on a robust BiLSTM model. This model, using word2vec embeddings, achieved an impressive 98.6% accuracy, 96.2% precision, and 99.3% ROC in vulnerability detection. The researchers optimised the BiLSTM’s hyperparameters—such as a learning rate of 0.001 and a batch size of 128—through rigorous testing, achieving reliable results as benchmarks.
Additionally, SafePyScript leverages ChatGPT’s language model to generate secure code alternatives. The research team implemented precise prompt engineering to maximise ChatGPT’s effectiveness in analysing Python code vulnerabilities, further supporting the tool’s usability.
SafePyScript’s frontend design, built with HTML, CSS, JavaScript (with Ajax), and a Django backend, ensures a smooth user experience. This structure allows developers to log in, upload or import code, select detection models, review reports, and access secure code—all within an intuitive, accessible platform.
You can learn more by reading the entire paper or accessing SafePyScript.
Here’s an excerpt from “Chapter 6: Time Series Forecasting as Regression” in the book, Modern Time Series Forecasting with Python - Second Edition by Manu Joseph and Jeffrey Tackes, published in October 2024.
Traditionally, each time series was treated in isolation. Because of that, traditional forecasting has always looked at the history of a single time series alone in fitting a forecasting function. But recently, because of the ease of collecting data in today's digital-
first world, many companies have started collecting large amounts of time series from similar sources, or related time series.
For example, retailers such as Walmart collect data on sales of millions of products across thousands of stores. Companies such as Uber or Lyft collect the demand for rides from all the zones in a city. In the energy sector, energy consumption data is collected across all consumers. All these sets of time series have shared behavior and are hence calledrelated time series.
We can consider that all the time series in a related time series come from separatedata generating processes(DGPs), and thereby model them all separately. We call these thelocalmodels of forecasting. An alternative to this approach is to assume that all the time series are coming from a single DGP. Instead of fitting a separate forecast function for each time series individually, we fit a single forecast function to all the related time series. This approach has been calledglobalorcross-learningin literature.
The terminologyglobalwas introduced byDavid Salinas et al.in theDeepARpaper andCross-learningbySlawek Smyl.
...having more data will lead to lower chances of overfitting and, therefore, lower generalization error (the difference between training and testing errors). This is exactly one of the shortcomings of the local approach. Traditionally, time series are not very long, and in many cases, it is difficult and time-consuming to collect more data as well. Fitting a machine learning model (with all its expressiveness) on small data is prone to overfitting. This is why time series models that enforce strong priors were used to forecast such time series, traditionally. But these strong priors, which restrict the fitting of traditional time series models, can also lead to a form of underfitting and limit accuracy.
Strong and expressive data-driven models, as in machine learning, require a larger amount of data to have a model that generalizes to new and unseen data. A time series, by definition, is tied to time, and sometimes, collecting more data means waiting for months or years and that is not desirable. So, if we cannot increase thelengthof the time-series dataset, we can increase thewidthof the time series dataset. If we add multiple time series to the dataset, we increase the width of the dataset, and there by increase the amount of data the model is getting trained with.Figure 5.7shows the concept of increasing the width of a time series dataset visually:
Figure 5.7 – The length and width of a time series dataset
This works in favor of machine learning models because with higher flexibility in fitting a forecast function and the addition of more data to work with, the machine learning model can learn a more complex forecast function than traditional time series models, which are typically shared between the related time series, in a completely data-driven way.
Another shortcoming of the local approach revolves around scalability. In the case of Walmart we mentioned earlier, there are millions of time series that need to be forecasted and it is not possible to have human oversight on all these models. If we think about this from an engineering perspective, training and maintaining millions of models in a production system would give any engineer a nightmare. But under the global approach, we only train a single model for all these time series, which drastically reduces the number of models we need to maintain and yet can generate all the required forecasts.
This new paradigm of forecasting has gained traction and has consistently been shown to improve the local approaches in multiple time series competitions, mostly in datasets of related time series. In Kaggle competitions, such asRossman Store Sales(2015),Wikipedia WebTraffic Time Series Forecasting(2017),Corporación Favorita Grocery Sales Forecasting(2018), andM5 Competition(2020), the winning entries were all global models—either machine learning or deep learning or a combination of both. TheIntermarché Forecasting Competition(2021) also had global models as the winning submissions. Links to these competitions are provided in theFurther readingsection.
Although we have many empirical findings where the global models have outperformed local models for related time series, global models are still a relatively new area of research.Montero-Manson and Hyndman(2020) showed a few very interesting results and showed that any local method can be approximated by a global model with required complexity, and the most interesting finding they put forward is that the global model will perform better, even with unrelated time series. We will talk more about global models and strategies for global models in Chapter 10,Global Forecasting Models.
Modern Time Series Forecasting with Python - Second Edition was published in October 2024.
And that’s a wrap.
We have an entire range of newsletters with focused content for tech pros. Subscribe to the ones you find the most usefulhere. The complete PythonPro archives can be foundhere.
If you have any suggestions or feedback, or would like us to find you a Python learning resource on a particular subject, just respond to this email!