Tour of the Python Standard Library

  • random
  • datetime
  • urllib
  • glob
  • re
  • logging
  • timeit
  • unittest

What is the Standard Library?

Python has a “batteries included” philosophy meaning the language library ships with a number of robust modules. Today we're going to look a few of them.

Being a good Python programmer isn't about being able to write any program in Python. It's also about knowing which things to write and which things are already written, much like being a good mathematician.

Modules We've Already Seen/Discussed

  • math/cmath
  • sys/os/pathlib
  • xml/csv/json
  • itertools/functools/contexlib

random

The random module contains classes and functions for generating random data of various types and distributions.

In [1]:
import random

random.random() # Random float from [0.0, 1.0)
Out[1]:
0.9461393931961889
In [2]:
random.randint(1, 10)  # Random integer from [1, 10]
Out[2]:
8
In [3]:
random.uniform(3, 5) # Uniform float from [3, 5]
Out[3]:
4.93753632999579
In [4]:
# Randomly selected item from the given list/iterable
random.choice(['a', 'b', 'c'])
Out[4]:
'c'

datetime

The datetime module defines datetime, date, time, and timedelta types for handling date arithmetic and date comparison. Note all of these types are immutable.

In [5]:
import datetime

today = datetime.date.today()
tomorrow = today + datetime.timedelta(days=1)
if today < tomorrow:
    print(today.isoformat())
2018-04-15

Date Formatting

These objects have a strftime method which allows you to represent the date/time in various formats. All allowable formatting directives can be found here: http://docs.python.org/library/datetime.html#strftime-strptime-behavior

In [6]:
import datetime

today = datetime.date.today()
now = datetime.datetime.now()
today.strftime('%a %B %d, %Y')
Out[6]:
'Sun April 15, 2018'
In [7]:
now.strftime('%m-%d-%y %I:%M %p')
Out[7]:
'04-15-18 10:47 AM'

Date Parsing

The datetime.strptime function takes a string representing a date and another string representing the format and returns a datetime object if it can be found.

In [8]:
import datetime

a = datetime.datetime.strptime('02-26-02','%m-%d-%y')
type(a)
Out[8]:
datetime.datetime
In [9]:
a.date()
Out[9]:
datetime.date(2002, 2, 26)
In [10]:
# Raises a ValueError if format that doesn't match
a = datetime.datetime.strptime('02-26-02','%d/%m/%Y')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-10-3fab116d6009> in <module>()
      1 # Raises a ValueError if format that doesn't match
----> 2 a = datetime.datetime.strptime('02-26-02','%d/%m/%Y')

/usr/lib/python3.6/_strptime.py in _strptime_datetime(cls, data_string, format)
    563     """Return a class cls instance based on the input string and the
    564     format string."""
--> 565     tt, fraction = _strptime(data_string, format)
    566     tzname, gmtoff = tt[-2:]
    567     args = tt[:6] + (fraction,)

/usr/lib/python3.6/_strptime.py in _strptime(data_string, format)
    360     if not found:
    361         raise ValueError("time data %r does not match format %r" %
--> 362                          (data_string, format))
    363     if len(data_string) != found.end():
    364         raise ValueError("unconverted data remains: %s" %

ValueError: time data '02-26-02' does not match format '%d/%m/%Y'

urllib

urllib is a standard module for accessing data over the internet. There are methods for retriving data as well as building urls and encoding query parameters.

In [22]:
# %load '../code/github.py'
import json
import urllib
import urllib.error
import urllib.request

search_url = 'https://api.github.com/search/repositories'
params = urllib.parse.urlencode({
    'q': 'language:python',
    'sort': 'stars',
    'order': 'desc',
    'per_page': 3,
})
url = '%s?%s' % (search_url, params)
try:
    response = urllib.request.urlopen(url, timeout=10)
except urllib.error.HTTPError as e:
    print('HTTPError getting Github data: %s' % e)
    print(e.headers)
except urllib.error.URLError as e:
    print('URLError getting Gitub data: %s' % e)
else:
    content = response.read()
    data = json.loads(content)
    for repository in data['items']:
        print('{full_name}: \u2605 {stargazers_count}'.format(**repository))
vinta/awesome-python: ★ 48457
rg3/youtube-dl: ★ 35917
toddmotto/public-apis: ★ 35557

glob

The glob module is used to find filenames matching a given pattern. You can use * and ? wildcard characters as well as [] character ranges. The rules used match the Unix shell.

In [12]:
import glob

glob.glob('*.ipynb')
Out[12]:
['MA792-002-Python-2.ipynb',
 'MA792-002-Python-1.ipynb',
 'MA792-002-Python-6.ipynb',
 'MA792-002-Python-5.ipynb',
 'MA792-002-Python-3.ipynb',
 'MA792-002-Python-4.ipynb']

re

Python's support for regular expressions are contained in the re module. Regular expresssions (or regex) are used for matching character patterns in strings. While they are very useful and powerful they can also get quite complicated.

Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.

—Jamie Zawinski

In [13]:
import re

simple_phone = re.compile(r'[1-9]\d{2}(-|\.|\s)\d{4}')
sample_text = '''
Selling all of my old textbooks. $20 or best offer.

Call 555-1234 or email books@example.com if interested.
'''
match = simple_phone.search(sample_text)
match.group(0)
Out[13]:
'555-1234'
In [14]:
match.start(), match.end()
Out[14]:
(59, 67)

logging

print statements are great but sometimes you need something a little bit more robust. The logging module contains support for logging programs via console output, file output (including file rotation), socket ouput, email output via SMTP, and HTTP output.

You can configure multiple loggers for a program each with different formats, outputs, and logging levels.

In [16]:
!cat ../code/log.conf
[loggers]
keys=root

[handlers]
keys=consoleHandler,fileHandler

[formatters]
keys=simpleFormatter

[logger_root]
level=DEBUG
handlers=consoleHandler,fileHandler
propagate=0

[handler_consoleHandler]
class=StreamHandler
level=DEBUG
formatter=simpleFormatter
args=(sys.stdout,)

[handler_fileHandler]
class=FileHandler
level=INFO
formatter=simpleFormatter
args=('example.log',)

[formatter_simpleFormatter]
format="%(asctime)s - %(levelname)s - %(message)s"
In [18]:
import logging
import logging.config

logging.config.fileConfig("../code/log.conf")
logger = logging.getLogger() # root by default

logger.debug("debug message")
logger.info("info message")
logger.warning("warning message")
logger.error("error message")
logger.critical("critical message")
"2018-04-15 10:48:26,724 - DEBUG - debug message"
"2018-04-15 10:48:26,726 - INFO - info message"
"2018-04-15 10:48:26,728 - WARNING - warning message"
"2018-04-15 10:48:26,729 - ERROR - error message"
"2018-04-15 10:48:26,731 - CRITICAL - critical message"

timeit/cProfile

timeit is a module for timing/profiling small pieces of python code.

cProfile is a more robust module for profiling which is combined with pstats to configure the profiler output statistics. profile is a pure-Python module with the same API as cProfile introduces additional overhead compared to cProfile which is written as a C extension.

Note there is a bug/license issue which excludes pstats from the default Python install on Ubuntu. You'll need to install the python-profiler package.

See https://bugs.launchpad.net/ubuntu/+source/python-defaults/+bug/123755

timeit Example (timeme.py)

profile Example (profileme.py)

unittest

The unittest module is a testing framework based on Kent Beck's Smalltalk testing framework. This same style of testing framework, called xUnit, has been written in most every language.

Testing Example - Program (program.py)

Sample program with some known flaws

Testing Example (test_program.py)

Test suite to expose those flaws

There are More Modules

Be sure to check the documenation on the Python website at http://docs.python.org/library/

Up Next

Common third-party Python libraries