python下rss的生成与解析

这两天在研究获取rss的内容,稍微作个总结吧

RSS生成:PyRSS2Gen

把指定的内容生成为rss,适用于python 2.3~3.3

下载地址:http://www.dalkescientific.com/Python/PyRSS2Gen.html

安装方法:python setup.py install

例子:

import datetime
import PyRSS2Gen

rss = PyRSS2Gen.RSS2( title = "Andrew's PyRSS2Gen feed", link = "http://www.dalkescientific.com/Python/PyRSS2Gen.html", description = "The latest news about PyRSS2Gen, a " "Python library for generating RSS2 feeds",

<span class="n">lastBuildDate</span> <span class="o">=</span> <span class="n">datetime</span><span class="o">.</span><span class="n">datetime</span><span class="o">.</span><span class="n">now</span><span class="p">(),</span>

<span class="n">items</span> <span class="o">=</span> <span class="p">[</span>
   <span class="n">PyRSS2Gen</span><span class="o">.</span><span class="n">RSSItem</span><span class="p">(</span>
     <span class="n">title</span> <span class="o">=</span> <span class="s">&quot;PyRSS2Gen-0.0 released&quot;</span><span class="p">,</span>
     <span class="n">link</span> <span class="o">=</span> <span class="s">&quot;http://www.dalkescientific.com/news/030906-PyRSS2Gen.html&quot;</span><span class="p">,</span>
     <span class="n">description</span> <span class="o">=</span> <span class="s">&quot;Dalke Scientific today announced PyRSS2Gen-0.0, &quot;</span>
                   <span class="s">&quot;a library for generating RSS feeds for Python.  &quot;</span><span class="p">,</span>
     <span class="n">guid</span> <span class="o">=</span> <span class="n">PyRSS2Gen</span><span class="o">.</span><span class="n">Guid</span><span class="p">(</span><span class="s">&quot;http://www.dalkescientific.com/news/&quot;</span>
                      <span class="s">&quot;030906-PyRSS2Gen.html&quot;</span><span class="p">),</span>
     <span class="n">pubDate</span> <span class="o">=</span> <span class="n">datetime</span><span class="o">.</span><span class="n">datetime</span><span class="p">(</span><span class="mi">2003</span><span class="p">,</span> <span class="mi">9</span><span class="p">,</span> <span class="mi">6</span><span class="p">,</span> <span class="mi">21</span><span class="p">,</span> <span class="mi">31</span><span class="p">)),</span>
   <span class="n">PyRSS2Gen</span><span class="o">.</span><span class="n">RSSItem</span><span class="p">(</span>
     <span class="n">title</span> <span class="o">=</span> <span class="s">&quot;Thoughts on RSS feeds for bioinformatics&quot;</span><span class="p">,</span>
     <span class="n">link</span> <span class="o">=</span> <span class="s">&quot;http://www.dalkescientific.com/writings/diary/&quot;</span>
            <span class="s">&quot;archive/2003/09/06/RSS.html&quot;</span><span class="p">,</span>
     <span class="n">description</span> <span class="o">=</span> <span class="s">&quot;One of the reasons I wrote PyRSS2Gen was to &quot;</span>
                   <span class="s">&quot;experiment with RSS for data collection in &quot;</span>
                   <span class="s">&quot;bioinformatics.  Last year I came across...&quot;</span><span class="p">,</span>
     <span class="n">guid</span> <span class="o">=</span> <span class="n">PyRSS2Gen</span><span class="o">.</span><span class="n">Guid</span><span class="p">(</span><span class="s">&quot;http://www.dalkescientific.com/writings/&quot;</span>
                           <span class="s">&quot;diary/archive/2003/09/06/RSS.html&quot;</span><span class="p">),</span>
     <span class="n">pubDate</span> <span class="o">=</span> <span class="n">datetime</span><span class="o">.</span><span class="n">datetime</span><span class="p">(</span><span class="mi">2003</span><span class="p">,</span> <span class="mi">9</span><span class="p">,</span> <span class="mi">6</span><span class="p">,</span> <span class="mi">21</span><span class="p">,</span> <span class="mi">49</span><span class="p">)),</span>
<span class="p">])</span>

rss.write_xml(open("pyrss2gen.xml", "w"))


RSS解析:feedparser

用于解析RSS、Atom和RDF,适用于python 2.4~3.3

下载地址:https://code.google.com/p/feedparser/downloads/list

安装方法:python setup.py install

例子:

>>> import feedparser
>>> d = feedparser.parse("http://feedparser.org/docs/examples/atom10.xml")
>>> d['feed']['title']             # feed data is a dictionary
u'Sample Feed'
>>> d.feed.title                   # get values attr-style or dict-style
u'Sample Feed'
>>> d.channel.title                # use RSS or Atom terminology anywhere
u'Sample Feed'
>>> d.feed.link                    # resolves relative links
u'http://example.org/'
>>> d.feed.subtitle                 # parses escaped HTML
u'For documentation only'
>>> d.channel.description          # RSS terminology works here too
u'For documentation only'
>>> len(d['entries'])              # entries are a list
1
>>> d['entries'][0]['title']       # each entry is a dictionary
u'First entry title'
>>> d.entries[0].title             # attr-style works here too
u'First entry title'
>>> d['items'][0].title            # RSS terminology works here too
u'First entry title'
>>> e = d.entries[0]
>>> e.link                         # easy access to alternate link
u'http://example.org/entry/3'
>>> e.links[1].rel                 # full access to all Atom links
u'related'
>>> e.links[0].href                # resolves relative links here too
u'http://example.org/entry/3'
>>> e.author_detail.name           # author data is a dictionary
u'Mark Pilgrim'
>>> e.updated_parsed              # parses all date formats
(2005, 11, 9, 11, 56, 34, 2, 313, 0)
>>> e.content[0].value             # sanitizes dangerous HTML
u'
Watch out for nasty tricks
'
>>> d.version                      # reports feed type and version
u'atom10'
>>> d.encoding                     # auto-detects character encoding
u'utf-8'
>>> d.headers.get('Content-type')  # full access to all HTTP headers
u'application/xml'

« 返回