
- Python - Text Processing
- Python - Text Processing Introduction
- Python - Text Processing Environment
- Python - String Immutability
- Python - Sorting Lines
- Python - Reformatting Paragraphs
- Python - Counting Token in Paragraphs
- Python - Binary ASCII Conversion
- Python - Strings as Files
- Python - Backward File Reading
- Python - Filter Duplicate Words
- Python - Extract Emails from Text
- Python - Extract URL from Text
- Python - Pretty Print
- Python - Text Processing State Machine
- Python - Capitalize and Translate
- Python - Tokenization
- Python - Remove Stopwords
- Python - Synonyms and Antonyms
- Python - Text Translation
- Python - Word Replacement
- Python - Spelling Check
- Python - WordNet Interface
- Python - Corpora Access
- Python - Tagging Words
- Python - Chunks and Chinks
- Python - Chunk Classification
- Python - Text Classification
- Python - Bigrams
- Python - Process PDF
- Python - Process Word Document
- Python - Reading RSS feed
- Python - Sentiment Analysis
- Python - Search and Match
- Python - Text Munging
- Python - Text wrapping
- Python - Frequency Distribution
- Python - Text Summarization
- Python - Stemming Algorithms
- Python - Constrained Search
Python - Reading RSS feed
RSS (Rich Site Summary) is a format for delivering regularly changing web content. Many news-related sites, weblogs and other online publishers syndicate their content as an RSS Feed to whoever wants it. In python we take help of the below package to read and process these feeds.
pip install feedparser
Feed Structure
In the below example we get the structure of the feed so that we can analyse further about which parts of the feed we want to process.
import feedparser NewsFeed = feedparser.parse("https://timesofindia.indiatimes.com/rssfeedstopstories.cms") entry = NewsFeed.entries[1] print entry.keys()
When we run the above program, we get the following output −
['summary_detail', 'published_parsed', 'links', 'title', 'summary', 'guidislink', 'title_detail', 'link', 'published', 'id']
Feed Title and Posts
In the below example we read the title and head of the rss feed.
import feedparser NewsFeed = feedparser.parse("https://timesofindia.indiatimes.com/rssfeedstopstories.cms") print 'Number of RSS posts :', len(NewsFeed.entries) entry = NewsFeed.entries[1] print 'Post Title :',entry.title
When we run the above program we get the following output −
Number of RSS posts : 5 Post Title : Cong-JD(S) in SC over choice of pro tem speaker
Feed Details
Based on above entry structure we can derive the necessary details from the feed using python program as shown below. As entry is a dictionary we utilise its keys to produce the values needed.
import feedparser NewsFeed = feedparser.parse("https://timesofindia.indiatimes.com/rssfeedstopstories.cms") entry = NewsFeed.entries[1] print entry.published print "******" print entry.summary print "------News Link--------" print entry.link
When we run the above program we get the following output −
Fri, 18 May 2018 20:13:13 GMT ****** Controversy erupted on Friday over the appointment of BJP MLA K G Bopaiah as pro tem speaker for the assembly, with Congress and JD(S) claiming the move went against convention that the post should go to the most senior member of the House. The combine approached the SC to challenge the appointment. Hearing is scheduled for 10:30 am today. ------News Link-------- https://timesofindia.indiatimes.com/india/congress-jds-in-sc-over-bjp-mla-made-pro-tem-speaker-hearing-at-1030-am/articleshow/64228740.cms