177. Tour of the Python Standard Library

  • random
  • datetime
  • urllib/urllib2
  • glob
  • re
  • logging
  • timeit
  • unittest

178. What is the Standard Library?

Python has a “batteries included” philosophy meaning the language library ships with a number of robust modules. Today we’re going to look a few of them.

Being a good Python programmer isn’t about being able to write any program in Python. It’s also about knowing which things to write and which things are already written, much like being a good mathematician.

179. Modules We’ve Already Seen/Discussed

  • math/cmath
  • sys
  • os
  • xml
  • csv
  • json
  • itertools

180. Python 3 Notes

The move to Python 3 included some module clean up in terms of both naming and content. For instance urllib and urllib2 are now one package called urllib. ConfigParser was renamed to configparser to match the PEP8 naming conventions.

181. random

The random module contains classes and functions for generating random data of various types and distributions.

import random

print random.random() # Random float from [0.0, 1.0)
print random.randint(1, 10)  # Random integer from [1, 10]
print random.uniform(3, 5) # Uniform float from [3, 5]
# Randomly selected item from the given list/iterable
print random.choice(['a', 'b', 'c'])

182. datetime

The datetime module defines datetime, date, time, and timedelta types for handling date arithmetic and date comparison. Note all of these types are immutable.

import datetime

today = datetime.date.today()
tomorrow = today + datetime.timedelta(days=1)
if today < tommorow:
    print today.isoformat()

183. Date Formatting (dateformat.py)

These objects have a strftime method which allows you to represent the date/time in various formats. All allowable formatting directives can be found here

http://docs.python.org/library/datetime.html#strftime-strptime-behavior

import datetime

today = datetime.date.today()
now = datetime.datetime.now()
print today.strftime('%a %B %d, %Y')
print now.strftime('%m-%d-%y %I:%M %p')

184. Date Parsing (dateparse.py)

The datetime.strptime function takes a string representing a date and another string representing the format and returns a datetime object if it can be found.

import datetime

a = datetime.datetime.strptime('02-26-02','%m-%d-%y')
print type(a)
print a.date()
# Raises a ValueError if format that doesn't match
a = datetime.datetime.strptime('02-26-02','%d/%m/%Y')

185. urllib/urllib2

urllib and urllib2 are modules for accessing data over the internet. There are methods for retriving data as well as building urls and encoding query parameters.

Both modules contain a urlopen function for accessing remote resources but urllib.urlopen cannot handle HTTPS, server timeouts, or proxies which require authentication. For simple resources you can use either but more complex requests should use urllib2.

186. Twitter Data Example (twitter.py)

import datetime
import json
import urllib
import urllib2

trends_url = 'http://api.twitter.com/1/trends/weekly.json'

More in the file.

187. glob

The glob module is used to find filenames matching a given pattern. You can use * and ? wildcard characters as well as [] character ranges. The rules used match the Unix shell.

import glob

print glob.glob('../lectures/s*.rst')
print glob.glob('../lectures/s??.rst')

188. re

Python’s support for regular expressions are contained in the re module. Regular expresssions (or regex) are used for matching character patterns in strings. While they are very useful and powerful they can also get quite complicated.

Some people, when confronted with a problem, think “I know, I’ll use regular expressions.” Now they have two problems.

—Jamie Zawinski

189. Basic Regex Example (regex.py)

simple_phone = re.compile(r'[1-9]\d{2}(-|\.|\s)\d{4}')
garbage = ''.join([
    random.choice(ascii_lowercase + digits + '-. ')
    for x in range(20000)
])
match = simple_phone.search(garbage)
if match:
    print garbage
    print 'Found a match!'
    print garbage[match.start():match.end()]

190. logging

print statements are great but sometimes you need something a little bit more robust. The logging module contains support for logging programs via console output, file output (including file rotation), socket ouput, email output via SMTP, and HTTP output.

You can configure multiple loggers for a program each with different formats, outputs, and logging levels.

191. Log Configuration Example (log.conf)

[loggers]
keys=root

[handlers]
keys=consoleHandler,fileHandler

[formatters]
keys=simpleFormatter

192. Log Usage Example (log.py)

import logging
import logging.config

logging.config.fileConfig("log.conf")
logger = logging.getLogger() # root by default

if __name__ == "__main__":
    logger.debug("debug message")
    logger.info("info message")
    logger.warn("warn message")

193. timeit/cProfile

timeit is a module for timing/profiling small pieces of python code.

cProfile is a more robust module for profiling which is combined with pstats to configure the profiler output statistics. profile is a pure-Python module with the same API as cProfile introduces additional overhead compared to cProfile which is written as a C extension.

Note there is a bug/license issue which excludes pstats from the default Python install on Ubuntu. You’ll need to install the python-profiler package.

See https://bugs.launchpad.net/ubuntu/+source/python-defaults/+bug/123755

194. Simple Profile Example (timeme.py)

from timeit import Timer

fibonacci_test = '[fibonacci(i) for i in xrange(1, 15)]'

t = Timer(fibonacci_test, "from fibonacci import fibonacci")
time = t.timeit(number=10000) / 10000.0 * 1000
print "%.2f millisecond/call" % time
t = Timer(fibonacci_test, "from fibo2 import fibonacci")
time = t.timeit(number=10000) / 10000.0 * 1000
print "%.2f millisecond/call" % time

195. Larger Profile Example (profileme.py)

import cProfile
import pstats

# Run profiler and save to twitter.stats
cProfile.run('import twitter', 'twitter.stats')
stats = pstats.Stats('twitter.stats')
# Clean up filenames for the report
stats.strip_dirs()
# Sort the statistics by the cumulative time spent
stats.sort_stats('cumulative')

196. unittest

The unittest module is a testing framework based on Kent Beck’s Smalltalk testing framework (original paper). This same style of testing framework, called xUnit, has been written in most every language.

197. Testing Example - Program (program.py)

Sample program with some known flaws

if __name__ == '__main__':
    symbols = get_symbol_info('nyse.csv')
    banks = filter_by_industry(symbols, 'Commercial Banks')
    print len(banks)

198. Testing Example (program_test.py)

Test suite to expose those flaws

import os
import program
import unittest

if __name__ == '__main__':
    unittest.main()

199. There are More Modules

Be sure to check the documenation on the Python website at http://docs.python.org/library/

200. Up Next

Common third-party Python libraries