Third-Party Libraries

Python outside of the standard library

Finding Packages

You can find installable packages on the Python Package Index (PyPi): https://pypi.org/

Installing Packages

pip is the canonical package manager for Python which is included with Python distributions since 3.4.

See https://docs.python.org/3/installing/ and https://pip.pypa.io/en/stable/quickstart/

Pip Examples

# Install the latest
$ pip install numpy
# Install a specific verion
$ pip install numpy==1.14.2
# Install a range of versions
$ pip install "numpy>=1.14,<1.15"

More Pip Examples

# Show currently installed packages
$ pip freeze
# Show packages with newer versions
$ pip list --outdated

Virtual Environments

The venv module provides support for creating lightweight “virtual environments” with their own site directories, optionally isolated from system site directories. Each virtual environment has its own Python binary (allowing creation of environments with various Python versions) and can have its own independent set of installed Python packages in its site directories.

https://docs.python.org/3/library/venv.html

$ python3.6 -m venv <name>

Common Libraries

  • requests
  • Pillow
  • NumPy
  • SciPy
  • Pandas

requests

Dubbed "HTTP for Humans", requests is wide of the most highly praised libraries for it's usefulness as an HTTP client library and it's clean interface. It provides a simple interface for advanced HTTP requests compared to the built-in urllib.

In [1]:
# %load ../code/newgithub.py
import requests

search_url = 'https://api.github.com/search/repositories'
params = {
    'q': 'language:python',
    'sort': 'stars',
    'order': 'desc',
    'per_page': 3,
}
try:
    response = requests.get(search_url, params=params, timeout=10)
    response.raise_for_status()
except requests.exceptions.HTTPError as e:
    print('HTTPError getting Github data: %s' % e)
except requests.exceptions.ConnectionError as e:
    print('ConnectionError getting Gitub data: %s' % e)
else:
    data = response.json()
    for repository in data['items']:
        print('{full_name}: \u2605 {stargazers_count}'.format(**repository))
vinta/awesome-python: ★ 48628
rg3/youtube-dl: ★ 36058
toddmotto/public-apis: ★ 35695

Pillow

Pillow is the most popular image manipulation library for Python.

In [2]:
# %load ../code/mandel.py
"""
Mandelbrot fractal using PIL (Python)

http://code.activestate.com/recipes/577111-mandelbrot-fractal-using-pil/
"""

from PIL import Image
# drawing area (xa < xb and ya < yb)
xa, xb, ya, yb = -2.0, 1.0, -1.5, 1.5
maxIt = 256  # iterations
# image size
imgx, imgy = 512, 512
image = Image.new("RGB", (imgx, imgy))

for y in range(imgy):
    cy = y * (yb - ya) / (imgy - 1) + ya
    for x in range(imgx):
        cx = x * (xb - xa) / (imgx - 1) + xa
        c = complex(cx, cy)
        z = 0
        for i in range(maxIt):
            if abs(z) > 2.0:
                break
            z = z * z + c
        r = i % 4 * 64
        g = i % 8 * 32
        b = i % 16 * 16
        image.putpixel((x, y), b * 65536 + g * 256 + r)

image.save("mandel.png", "PNG")

NumPy/SciPy

NumPy is a package for handling array based data. It adds vector and matrix operations on top of standard Python lists.

SciPy is built on top of NumPy. It provides scientific computational tools for statistics, optimization, numerical integration, linear algebra, Fourier transforms, signal processing, image processing, ODE solving, and more.

NumPy For Matlab Users

If you use Matlab then I would recommend reading this page on comparing NumPy to Matlab: http://www.scipy.org/NumPy_for_Matlab_Users

Creating NumPy Arrays

ndarray is the basic array type provided by NumPy. There a number of ways of creating them.

In [3]:
import numpy

a = numpy.array([1, 2, 3]) # 1-dimensional
b = numpy.array([[1, 2, 3], [4, 5, 6]]) # 2-dimensional
zero = numpy.zeros((3, 3))
one = numpy.ones((2, 3, 4))
even = numpy.arange(2, 20, 2)
unit = numpy.linspace(0, 1, 10)

Using NumPy Arrays

Arithmetic operators on arrays apply elementwise and create a new array with the result.

In [4]:
x = numpy.arange(4)
y = numpy.arange(4)
x + y
Out[4]:
array([0, 2, 4, 6])
In [5]:
x * 2
Out[5]:
array([0, 2, 4, 6])
In [6]:
x ** 2
Out[6]:
array([0, 1, 4, 9])
In [7]:
x * y
Out[7]:
array([0, 1, 4, 9])
In [8]:
numpy.dot(x, y)
Out[8]:
14

Array Slicing

In [9]:
x = numpy.arange(20).reshape(4, 5)
x
Out[9]:
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19]])
In [10]:
x[1, 1]
Out[10]:
6
In [11]:
x[:, 2]
Out[11]:
array([ 2,  7, 12, 17])
In [12]:
y = x.reshape(2, 2, 5)
y
Out[12]:
array([[[ 0,  1,  2,  3,  4],
        [ 5,  6,  7,  8,  9]],

       [[10, 11, 12, 13, 14],
        [15, 16, 17, 18, 19]]])
In [13]:
y[:, :, 1]
Out[13]:
array([[ 1,  6],
       [11, 16]])
In [14]:
y[..., 1]
Out[14]:
array([[ 1,  6],
       [11, 16]])

SciPy Modules

  • Clustering package (scipy.cluster)
  • Constants (scipy.constants)
  • Fourier transforms (scipy.fftpack)
  • Integration and ODEs (scipy.integrate)
  • Interpolation (scipy.interpolate)
  • Input and output (scipy.io)
  • Linear algebra (scipy.linalg)
  • Maximum entropy models (scipy.maxentropy)
  • Miscellaneous routines (scipy.misc)
  • Multi-dimensional image processing (scipy.ndimage)
  • Orthogonal distance regression (scipy.odr)
  • Optimization and root finding (scipy.optimize)

SciPy Modules (cont.)

  • Signal processing (scipy.signal)
  • Sparse matrices (scipy.sparse)
  • Sparse linear algebra (scipy.sparse.linalg)
  • Spatial algorithms and data structures (scipy.spatial)
  • Distance computations (scipy.spatial.distance)
  • Special functions (scipy.special)
  • Statistical functions (scipy.stats)
  • C/C++ integration (scipy.weave)

Other Packages Built on SciPy

  • scikit-learn - "Machine Learning in Python"
  • sckit-image - "Image processing in Python"

And many more http://scikits.appspot.com/scikits

Pandas

pandas has emerged as one of the premier libraries for data analysis in Python. It works with and extends NumPy data types and provides additional useful tools such as:

  • Series and DataFrame types
  • Aggregation and pivoting of data sets
  • Date range generation
  • Input/output utilities
  • Rolling statistics

http://pandas.pydata.org/pandas-docs/stable/index.html

Dataframes

Dataframes are a 2-dimensional labeled array structure. You can have both labeled rows and columns allow you to easily reference, manipulate, and aggregate on the data.

In [9]:
import pandas

prices = pandas.DataFrame({'IBM': [100, 99, 101, 102], 'GOOG': [1000, 1010, 1015, 1012]})
prices
Out[9]:
GOOG IBM
0 1000 100
1 1010 99
2 1015 101
3 1012 102
In [17]:
prices['IBM']
Out[17]:
2018-04-13    100
2018-04-16     99
2018-04-17    101
2018-04-18    102
Name: IBM, dtype: int64
In [18]:
prices[:'2018-04-13']
Out[18]:
GOOG IBM
2018-04-13 1000 100
In [19]:
prices / prices.shift(1) - 1
Out[19]:
GOOG IBM
2018-04-13 NaN NaN
2018-04-16 0.010000 -0.010000
2018-04-17 0.004950 0.020202
2018-04-18 -0.002956 0.009901
In [20]:
# Percent change with 1 day look back
prices.pct_change(1)
Out[20]:
GOOG IBM
2018-04-13 NaN NaN
2018-04-16 0.010000 -0.010000
2018-04-17 0.004950 0.020202
2018-04-18 -0.002956 0.009901

Next Up

More Python tools and libraries and how to continue learning