





















































Thousands of startups use Notion as a connected workspace to create and share docs, take notes, manage projects, and organize knowledge—all in one place.
We’re offering 6 months of new Plus plans, including unlimited Notion AI so you can try it all for free!
To redeem the Notion for Startups offer:
1. Submit an application using our custom link: https://ntn.so/packt and select Packt on the partner list.
2. Include our partner key: STARTUP4110P19151
Hi ,
Welcome to a brand new issue of PythonPro!
In today’sExpert Insight we bring you an excerpt from the recently published book, Python Natural Language Processing Cookbook - Second Edition, which explains how to use the displaCy
library from spacy
to visualize named entities in text.
News Highlights: PEP 762 in Python 3.13 adds multi-line editing, syntax highlighting, and custom commands to the REPL, and Pyinstrument 5 introduces a flamegraph timeline view for better code execution visualization.
Here are my top 5 picks from our learning resources today:
And, today’s Featured Study, presents a method using LLMs to generate precise, transparent code transformations, improving accuracy and efficiency for compiler optimizations and legacy refactoring.
Stay awesome!
Divya Anne Selvaraj
Editor-in-Chief
P.S.:This month's survey is still live, do take the opportunity to leave us your feedback, request a learning resource, and earn your one Packt credit for this month.
94% of cloud tenants were targeted last year, and 62% were successfully compromised.
The hard truth is that organizations are having a hard time securing their cloud data—and cyberattackers are ready to exploit that challenge.
Here’s a handy resource you’ll want with you as you map out your plan: Orchestrating the Symphony of Cloud Data Security.
You’ll learn how to: Overcome the challenges of securing data in the cloud, Navigate multi cloud data security, and Balance data security with cloud economics
lintsampler
: a new way to quickly get random samples from any distribution: Introduces a Python package designed to easily and efficiently generate random samples from any probability distribution.shmget
, shmat
, and shmctl
for shared memory management, handling void pointers, and performing basic operations like writing to shared memory.In "Don't Transform the Code, Code the Transforms: Towards Precise Code Rewriting using LLMs," researchers from Meta, Cummins et al., introduce a novel method called Code the Transforms (CTT), which leverages LLMs to generate precise code transformations rather than directly rewriting code.
Code transformation refers to rewriting or optimising existing code, a task essential for compiler optimisations, legacy code refactoring, or performance improvements. Traditional rule-based approaches to code transformations are difficult to implement and maintain. LLMs offer the potential to automate this process, but direct code rewriting by LLMs lacks precision and is challenging to debug. This study introduces the CTT method, where LLMs generate the transformation logic, making the process more transparent and adaptable.
This study is particularly beneficial for software engineers, developers, and those working on compiler optimisations or legacy code refactoring. By using this method, teams can reduce the time spent on manual code review and debugging, while improving the precision of code transformations.
The study's methodology involved testing 16 different Python code transformations across a variety of tasks, ranging from simple operations like constant folding to more complex transformations such as converting dot products to PyTorch API calls. The CTT method achieved an overall F1 score of 0.97, compared to the 0.75 achieved by the direct rewriting method. The precision of transformations ranged from 93% to 100%, with tasks like dead code elimination and redundant function elimination reaching near-perfect performance. In contrast, the traditional direct LLM rewriting approach showed an average precision of 60%, and was prone to more frequent errors, requiring manual correction.
You can learn more by reading the entire paper.
Here’s an excerpt from “Chapter 7: Visualizing Text Data” in the book, Python Natural Language Processing Cookbook - Second Edition by Zhenya Antić and Saurabh Chakravarty, published in September 2024.
Named entity recognition, orNER, is a very useful tool for quickly finding people, organizations, locations, and other entities in texts. In order to visualize them better, we can use thedisplacy
package to create compelling andeasy-to-read images.
After working through this recipe, you will be able to create visualizations of named entities in a text using different formatting options and save the results ina file.
ThedisplaCy
library is part of thespacy
package. You need at least version 2.0.12 of thespacy
package fordisplaCy
to work. The version in thepoetry
environment andrequirements.txt
fileis 3.6.1.
The notebook is locatedathttps://github.com/PacktPublishing/Python-Natural-Language-Processing-Cookbook-Second-Edition/blob/main/Chapter07/7.3_ner.ipynb.
We will usespacy
to parse the sentence and then thedisplacy
engine to visualize thenamed entities:
spacy
anddisplacy
:import spacy
from spacy import displacy
%run -i "../util/lang_utils.ipynb"
text = """iPhone 12: Apple makes jump to 5G
Apple has confirmed its iPhone 12 handsets will be its first to work on faster 5G networks.
The company has also extended the range to include a new "Mini" model that has a smaller 5.4in screen.
The US firm bucked a wider industry downturn by increasing its handset sales over the past year.
But some experts say the new features give Apple its best opportunity for growth since 2014, when it revamped its line-up with the iPhone 6.
"5G will bring a new level of performance for downloads and uploads, higher quality video streaming, more responsive gaming,
real-time interactivity and so much more," said chief executive Tim Cook.
There has also been a cosmetic refresh this time round, with the sides of the devices getting sharper, flatter edges.
The higher-end iPhone 12 Pro models also get bigger screens than before and a new sensor to help with low-light photography.
However, for the first time none of the devices will be bundled with headphones or a charger."""
Doc
object. We then modify the object to contain a title. This title will be part of theNER visualization:doc = small_model(text)
doc.user_data["title"] = "iPhone 12: Apple makes jump to 5G"
ORG
-labeled text and yellow for thePERSON
-labeled text. We then set theoptions
variable, which contains the colors. Finally, we use therender
command to display the visualization. As arguments, we provide theDoc
object and the options we previously defined. We also set thestyle
argument to"ent"
, as we would like to display just entities. We set thejupyter
argument toTrue
in order to display directly inthe notebook:colors = {"ORG": "green", "PERSON":"yellow"}
options = {"colors": colors}
displacy.render(doc, style='ent', options=options, jupyter=True)
The output should look like that inFigure 7.4.
Figure 7.4 – Named entities visualization
path
variable. Then, we use the samerender
command, but we set thejupyter
argument toFalse
this time and assign the output of the command to thehtml
variable. We then open the file, write the HTML, and closethe file:path = "../data/ner_vis.html"
html = displacy.render(doc, style="ent",
options=options, jupyter=False)
html_file= open(path, "w", encoding="utf-8")
html_file.write(html)
html_file.close()
This will create an HTML file with theentities visualization.
Packt library subscribers can continue reading the entire book for free. You can buy Python Natural Language Processing Cookbook - Second Edition,here.
And that’s a wrap.
We have an entire range of newsletters with focused content for tech pros. Subscribe to the ones you find the most usefulhere. The complete PythonPro archives can be foundhere.
If you have any suggestions or feedback, or would like us to find you aPythonlearning resource on a particular subject, take thesurveyor just respond to this email!