Python has a “batteries included” philosophy meaning the language library ships with a number of robust modules. Today we’re going to look a few of them.
Being a good Python programmer isn’t about being able to write any program in Python. It’s also about knowing which things to write and which things are already written, much like being a good mathematician.
The move to Python 3 included some module clean up in terms of both naming and content. For instance urllib and urllib2 are now one package called urllib. ConfigParser was renamed to configparser to match the PEP8 naming conventions.
The random module contains classes and functions for generating random data of various types and distributions.
import random
print random.random() # Random float from [0.0, 1.0)
print random.randint(1, 10) # Random integer from [1, 10]
print random.uniform(3, 5) # Uniform float from [3, 5]
# Randomly selected item from the given list/iterable
print random.choice(['a', 'b', 'c'])
The datetime module defines datetime, date, time, and timedelta types for handling date arithmetic and date comparison. Note all of these types are immutable.
import datetime
today = datetime.date.today()
tomorrow = today + datetime.timedelta(days=1)
if today < tommorow:
print today.isoformat()
These objects have a strftime method which allows you to represent the date/time in various formats. All allowable formatting directives can be found here
http://docs.python.org/library/datetime.html#strftime-strptime-behavior
import datetime
today = datetime.date.today()
now = datetime.datetime.now()
print today.strftime('%a %B %d, %Y')
print now.strftime('%m-%d-%y %I:%M %p')
The datetime.strptime function takes a string representing a date and another string representing the format and returns a datetime object if it can be found.
import datetime
a = datetime.datetime.strptime('02-26-02','%m-%d-%y')
print type(a)
print a.date()
# Raises a ValueError if format that doesn't match
a = datetime.datetime.strptime('02-26-02','%d/%m/%Y')
urllib and urllib2 are modules for accessing data over the internet. There are methods for retriving data as well as building urls and encoding query parameters.
Both modules contain a urlopen function for accessing remote resources but urllib.urlopen cannot handle HTTPS, server timeouts, or proxies which require authentication. For simple resources you can use either but more complex requests should use urllib2.
import datetime
import json
import urllib
import urllib2
trends_url = 'http://api.twitter.com/1/trends/weekly.json'
More in the file.
The glob module is used to find filenames matching a given pattern. You can use * and ? wildcard characters as well as [] character ranges. The rules used match the Unix shell.
import glob
print glob.glob('../lectures/s*.rst')
print glob.glob('../lectures/s??.rst')
Python’s support for regular expressions are contained in the re module. Regular expresssions (or regex) are used for matching character patterns in strings. While they are very useful and powerful they can also get quite complicated.
Some people, when confronted with a problem, think “I know, I’ll use regular expressions.” Now they have two problems.
—Jamie Zawinski
simple_phone = re.compile(r'[1-9]\d{2}(-|\.|\s)\d{4}')
garbage = ''.join([
random.choice(ascii_lowercase + digits + '-. ')
for x in range(20000)
])
match = simple_phone.search(garbage)
if match:
print garbage
print 'Found a match!'
print garbage[match.start():match.end()]
print statements are great but sometimes you need something a little bit more robust. The logging module contains support for logging programs via console output, file output (including file rotation), socket ouput, email output via SMTP, and HTTP output.
You can configure multiple loggers for a program each with different formats, outputs, and logging levels.
[loggers]
keys=root
[handlers]
keys=consoleHandler,fileHandler
[formatters]
keys=simpleFormatter
import logging
import logging.config
logging.config.fileConfig("log.conf")
logger = logging.getLogger() # root by default
if __name__ == "__main__":
logger.debug("debug message")
logger.info("info message")
logger.warn("warn message")
timeit is a module for timing/profiling small pieces of python code.
cProfile is a more robust module for profiling which is combined with pstats to configure the profiler output statistics. profile is a pure-Python module with the same API as cProfile introduces additional overhead compared to cProfile which is written as a C extension.
Note there is a bug/license issue which excludes pstats from the default Python install on Ubuntu. You’ll need to install the python-profiler package.
See https://bugs.launchpad.net/ubuntu/+source/python-defaults/+bug/123755
from timeit import Timer
fibonacci_test = '[fibonacci(i) for i in xrange(1, 15)]'
t = Timer(fibonacci_test, "from fibonacci import fibonacci")
time = t.timeit(number=10000) / 10000.0 * 1000
print "%.2f millisecond/call" % time
t = Timer(fibonacci_test, "from fibo2 import fibonacci")
time = t.timeit(number=10000) / 10000.0 * 1000
print "%.2f millisecond/call" % time
import cProfile
import pstats
# Run profiler and save to twitter.stats
cProfile.run('import twitter', 'twitter.stats')
stats = pstats.Stats('twitter.stats')
# Clean up filenames for the report
stats.strip_dirs()
# Sort the statistics by the cumulative time spent
stats.sort_stats('cumulative')
The unittest module is a testing framework based on Kent Beck’s Smalltalk testing framework (original paper). This same style of testing framework, called xUnit, has been written in most every language.
Sample program with some known flaws
if __name__ == '__main__':
symbols = get_symbol_info('nyse.csv')
banks = filter_by_industry(symbols, 'Commercial Banks')
print len(banks)
Test suite to expose those flaws
import os
import program
import unittest
if __name__ == '__main__':
unittest.main()
Be sure to check the documenation on the Python website at http://docs.python.org/library/
Common third-party Python libraries