Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Arrow up icon
GO TO TOP
Generative AI with Python and PyTorch

You're reading from   Generative AI with Python and PyTorch Navigating the AI frontier with LLMs, Stable Diffusion, and next-gen AI applications

Arrow left icon
Product type Paperback
Published in Mar 2025
Publisher Packt
ISBN-13 9781835884447
Length 450 pages
Edition 2nd Edition
Languages
Tools
Arrow right icon
Authors (2):
Arrow left icon
Joseph Babcock Joseph Babcock
Author Profile Icon Joseph Babcock
Joseph Babcock
Raghav Bali Raghav Bali
Author Profile Icon Raghav Bali
Raghav Bali
Arrow right icon
View More author details
Toc

Table of Contents (18) Chapters Close

Preface 1. Introduction to Generative AI: Drawing Data from Models 2. Building Blocks of Deep Neural Networks FREE CHAPTER 3. The Rise of Methods for Text Generation 4. NLP 2.0: Using Transformers to Generate Text 5. LLM Foundations 6. Open-Source LLMs 7. Prompt Engineering 8. LLM Toolbox 9. LLM Optimization Techniques 10. Emerging Applications in Generative AI 11. Neural Networks Using VAEs 12. Image Generation with GANs 13. Style Transfer with GANs 14. Deepfakes with GANs 15. Diffusion Models and AI Art 16. Other Books You May Enjoy
17. Index

The rules of probability

At the simplest level, a model, be it machine learning or a more classical method such as linear regression, is a mathematical description of how a target variable changes in response to variation in a predictive variable; that relationship could be a linear slope or any of a number of more complex mathematical transformations. In the task of modeling, we usually think of separating the variables in our dataset into two broad classes:

  • Independent data, by which we primarily mean inputs to a model, is often denoted by X. For example, if we are trying to predict the grades of school students on an end-of-year exam based on their characteristics, we could think of several kinds of features:
    • Categorical: If there are six schools in a district, the school that a student attends could be represented by a six-element vector for each student. The elements are all 0, except for one that is 1, indicating which of the six schools they are enrolled in.
    • Continuous: The student heights or average prior test scores can be represented as continuous real numbers.
    • Ordinal: The rank of the student in their class is not meant to be an absolute quantity (like their height) but rather a measure of relative difference.
  • Dependent variables, conversely, are the outputs of our models and are denoted by the letter Y. Note that, in some cases, Y is a “label” that can be used to condition a generative output, such as in a conditional GAN. It can be categorical, continuous, or ordinal, and could be an individual element or multidimensional matrix (tensor) for each element of the dataset.

How can we describe the data in our model using statistics? In other words, how can we quantitatively describe what values we are likely to see, how frequently, and which values are more likely to appear together and others? One way is by asking how likely it is to observe a particular value in the data or the probability of that value. For example, if we were to ask what the probability of observing a roll of four on a six-sided die is, the answer is that, on average, we would observe a four once every six rolls. We write this as follows:

P(X=4) = 1⁄6 = 16.67%

Here, P denotes “probability of.” What defines the allowed probability values for a particular dataset? If we imagine the set of all possible values of a dataset—such as all values of a die—then a probability maps each value to a number between 0 and 1. The minimum is 0 because we cannot have a negative chance of seeing a result; the most unlikely result is that we would never see a particular value, or 0% probability, such as rolling a seven on a six-sided die. Similarly, we cannot have a greater than 100% probability of observing a result, represented by the value 1; an outcome with probability 1 is absolutely certain. This set of probability values associated with a dataset belongs to discrete classes (such as the faces of a die) or an infinite set of potential values (such as variations in height or weight). In either case, however, these values have to follow certain rules, the probability axioms described by the mathematician Andrey Kolmogorov in 193314:

  1. The probability of an observation (a die roll, a particular height) is a non-negative, finite number between 0 and 1.
  2. The probability of at least one of the observations in the space of all possible observations occurring is 1.
  3. The probability of distinct, mutually exclusive events (such as the rolls 1-6 on a die) is the sum of the probability of the individual events.

While these rules might seem abstract, we will see in Chapter 3 that they have direct relevance to developing neural network models. For example, an application of rule 1 is to generate the probability between 1 and 0 for a particular outcome in a softmax function for predicting target classes. For example, if our model is asked to classify whether an image contains a cat, dog, or horse, each potential class receives a probability between 0 and 1 as the output of a sigmoid function based on a deep neural network applying nonlinear, multi-layer transformations on the input pixels of an image we are trying to classify. Rule 3 is used to normalize these outcomes into the range 0–1, under the guarantee that they are mutually distinct predictions of a deep neural network (in other words, a real-world image logically cannot be classified as both a dog and cat, but rather a dog or cat, with the probability of these two outcomes additive). Finally, the second rule provides the theoretical guarantees that we can generate data at all using these models.

However, in the context of machine learning and modeling, we are not usually interested in just the probability of observing a piece of input data, X; we instead want to know the conditional probability of an outcome Y given the data X. Said another way, we want to know how likely a label for a set of data is, based on that data. We write this as the probability of Y given X, or the probability of Y conditional on X:

P(Y|X)

Another question we could ask about Y and X is how likely they are to occur together—their joint probability—which can be expressed using the preceding conditional probability expression as:

P(X, Y) = P(Y|X)P(X) = P(X|Y)P(Y)

This formula expressed the probability of X and Y. In the case of X and Y being completely independent of one another, this is simply their product:

P(X|Y)P(Y) = P(Y|X)P(X) = P(X)P(Y)

You will see that these expressions become important in our discussion of complementary priors in Chapter 4, and the ability of restricted Boltzmann machines to simulate independent data samples. They are also important as building blocks of Bayes’ theorem, which we describe next.

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €18.99/month. Cancel anytime