{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Software I : Anaconda, AstroPy, and libraries" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this school, we are making extensive use of Python and various associated libraries and so the first thing we need to ensure is that we all have a common setup and are using the same software. The Python distribution that we have decided to use is Anaconda which can be downloaded from here (although we hope that you have already done this prior to the school). Make sure that you installed the Python 2.7 version for your operating system (there is nothing wrong with Python 3.x but it is slightly different syntactically and 2.7 is the currently approved version for LSST code development)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Installing packages" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "One of the advantages of the Anaconda distribution is that it comes with many of the most commonly-used Python packages, such as numpy, scipy, and scikit-learn, preinstalled. However, if you do need to install a new package then it is very straightforward: you can either use the Anaconda installation tool conda or the generic Python tool pip (both use a different registry of available packages and sometimes a particular package will not available via one tool but will be via the other)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For example, irlbpy is a superfast algorithm for finding the largest eigenvalues (and corresponding eigenvectors) of very large matrices. We can try to install it first with conda:\n", "\n", "conda install irlbpy\n", "\n", "but this will not find it:\n", "\n", "Fetching package metadata: ....\n", "Error: No packages found in current osx-64 channels matching: irlbpy\n", "\n", "You can search for this package on Binstar with
\n", " binstar search -t conda irlbpy\n", "
\n", "\n", "so instead we try with pip:\n", "\n", "pip install irlbpy\n", "\n", "In the event that both fail, you always just download the package source code and then install it manually with:\n", "\n", "python install setup.py\n", "\n", "in the appropriate source directory.\n", "\n", "We'll now take a brief look at a few of the main Python packages. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## NumPy" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "NumPy is the main Python package for working with N-dimensional arrays. Any list of numbers can be recast as a NumPy array:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([2, 5, 3, 9, 7])" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import numpy as np\n", "x = np.array([2,5,3,9,7])\n", "x" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Arrays have a number of useful methods associated with them:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1 5 15 0 1\n" ] } ], "source": [ "print x.min(), x.max(), x.sum(), x.argmin(), x.argmax() " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "and NumPy functions can act on arrays in an elementwise fashion: " ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 0.01745241, 0.0348995 , 0.05233596, 0.06975647, 0.08715574])" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.sin(x * np.pi / 180.)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Ranges of values are easily produced:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 1. , 1.25, 1.5 , 1.75, 2. , 2.25, 2.5 , 2.75, 3. ,\n", " 3.25, 3.5 , 3.75, 4. , 4.25, 4.5 , 4.75, 5. , 5.25,\n", " 5.5 , 5.75, 6. , 6.25, 6.5 , 6.75, 7. , 7.25, 7.5 ,\n", " 7.75, 8. , 8.25, 8.5 , 8.75, 9. , 9.25, 9.5 , 9.75])" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.arange(1, 10, 0.25)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 1. , 5.75, 10.5 , 15.25, 20. ])" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.linspace(1, 20, 5)" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 10. , 31.6227766 , 100. , 316.22776602,\n", " 1000. ])" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.logspace(1, 3, 5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Random numbers are also easily generated in the half-open interval [0, 1):" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 0.32236496, 0.21506812, 0.43010248, 0.00518381, 0.76868494,\n", " 0.40007316, 0.54393627, 0.47369813, 0.84379927, 0.64993354])" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.random.random(10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "or from one of the large number of statistical distributions provided:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ -3.73267855, 1.37170819, 8.16304747, -0.27896174,\n", " 10.39564368, 0.45185063, 7.04073321, -1.39126892,\n", " 1.3358371 , 3.90961175, 2.30816605, 19.87333166,\n", " 5.41203659, 5.13839341, 5.06171327, 4.38265858,\n", " 3.78950268, 7.06296169, 4.80470557, 10.20694391])" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.random.normal(loc = 2.5, scale = 5, size = 20)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Another useful method is the where function for identifying elements that satisfy a particular condition: " ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 1.87105877e-01 -2.39603961e+00 1.44687770e+00 -2.16482071e-01\n", " 5.75713272e-01 -4.05465421e-01 3.24493064e-01 1.94923740e+00\n", " 1.82203983e-01 9.14587481e-01 -2.71385704e-02 -3.40197555e-01\n", " -1.23485896e+00 -8.59566120e-04 1.58623798e+00 -1.45167513e-02\n", " -1.04614749e+00 7.98648076e-01 2.14170370e+00 3.06397570e-01\n", " 7.26984985e-01 3.30964686e-01 -8.96114730e-01 2.16686828e+00\n", " 2.96145626e-01 -8.16399365e-01 -9.49566662e-01 4.22530260e-01\n", " -1.61935998e+00 -4.40557627e-01 -3.17059317e-01 1.19800654e+00\n", " -9.78872255e-02 -1.55824225e+00 -3.78734151e-01 1.94701640e-02\n", " -2.35352615e-01 -1.66271785e+00 -6.62442251e-01 8.82626351e-01\n", " 1.09057493e+00 -3.89036409e-01 8.15959802e-01 -6.51061933e-01\n", " 7.40576461e-01 7.21376501e-02 -2.10867165e+00 1.62439110e-01\n", " -3.96635524e-01 -6.51321684e-01 1.04665708e+00 3.81217863e-01\n", " 2.10488771e-01 -1.28420630e+00 1.64306857e+00 2.26955679e-01\n", " -5.90036730e-01 6.67998555e-01 -5.98224112e-01 -1.68637677e+00\n", " -1.33658092e+00 6.04846193e-01 7.52765792e-01 1.14791072e+00\n", " 1.56404954e+00 -4.61466287e-01 -9.80462468e-01 -3.04874485e-01\n", " -7.54546380e-01 -4.49164973e-01 1.10599794e-01 4.86689736e-01\n", " -5.38798609e-01 1.54066810e+00 -2.78327667e+00 1.01860836e+00\n", " -3.29753652e-01 -7.75479694e-01 -1.07988589e-01 -1.08019965e+00\n", " -8.61536357e-01 1.04061514e+00 -4.58156224e-01 -1.26734380e+00\n", " -3.27750799e-01 -1.89330699e-01 -2.20749803e+00 -9.86040664e-01\n", " 1.79276591e-01 -1.22244175e+00 -4.13797330e-01 -8.33445520e-01\n", " -5.70991723e-01 -2.69657181e-01 1.55987699e+00 -2.97847096e-01\n", " 3.18306178e-01 -3.56743483e-01 3.83583671e-01 -5.97183047e-01]\n" ] }, { "data": { "text/plain": [ "(array([18, 23]),)" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x = np.random.normal(size = 100)\n", "print x\n", "np.where(x > 2.)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Of course, all of these work equally well with multidimensional arrays." ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 0.84147098, 0.90929743, 0.14112001, -0.7568025 , -0.95892427],\n", " [-0.2794155 , 0.6569866 , 0.98935825, 0.41211849, -0.54402111]])" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x = np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])\n", "np.sin(x)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Data can also be automatically loaded from a file into a Numpy array via the loadtxt or genfromtxt methods:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data = np.loadtxt(\"somefile.csv\", delimiter = \",\", skiprows = 3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## SciPy" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "SciPy provides a number of subpackages that deal with common operations in scientific computing, such as numerical integration, optimization, interpolation, Fourier transforms and linear algebra." ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(1.6346035509274763, 1.1580617576001373e-08)\n", "(1.2743983238992254, 1.1301087878068563e-08)\n", "(1.4332524555959525, 5.534144099065977e-11)\n" ] } ], "source": [ "f = lambda x: np.cos(-x ** 2 / 9.)\n", "\n", "x = np.linspace(0, 10, 11)\n", "y = f(x)\n", "\n", "from scipy.interpolate import interp1d\n", "f1 = interp1d(x, y)\n", "f2 = interp1d(x, y, kind = 'cubic')\n", "\n", "from scipy.integrate import quad\n", "print quad(f1, 0, 10)\n", "print quad(f2, 0, 10)\n", "print quad(f, 0, 10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## scikit-learn" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "scikit-learn provides algorithms for machine learning tasks, such as classification, regression, and clustering, as well as associated operations, such as cross-validation and feature normalization. These topics will be covered in greater depth in Guillermo Cabrera's talks here. A related module is astroML which is a wrapper around a lot of the scikit-learn routines but also offers some additional functionality and faster/alternate implementations of some methods." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## pandas" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "pandas offers data structures, particularly data frames, and operations for manipulating numerical tables and time series, such as fancy indexing, reshaping and pivoting, and merging, as well as a number of analysis tools. Although similar functionality already exists in numpy, pandas is highly optimized for performance and large data sets. Some of these topics will be covered in greater depth in Mauricio San Martin's talk here." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## AstroPy" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "AstroPy aims to provide a core set of subpackages to specifically support astronomy. These include methods to work with image and table data formats, e.g., FITS, VOTable, etc., along with astronomical coordinate and unit systems, and cosmological calculations." ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "00h42m30s +41d12m00s\n", "\n" ] } ], "source": [ "from astropy import units as u\n", "from astropy.coordinates import SkyCoord\n", "\n", "c = SkyCoord(ra = 10.625 * u.degree, dec = 41.2 * u.degree, frame = 'icrs')\n", "print c.to_string('hmsdms')\n", "print c.galactic" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "3944.5841858 Mpc 8875.31441806 Mpc\n" ] } ], "source": [ "from astropy.cosmology import WMAP9 as cosmo\n", "print cosmo.comoving_distance(1.25), cosmo.luminosity_distance(1.25)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from astropy.io import fits\n", "hdulist = fits.open('someimage.fits')\n", "hdulist.info()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from astropy.io.votable import parse\n", "votable = parse('sometable.xml')\n", "table = votable.get_first_table()\n", "data = table.array" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A useful affiliated package is Astroquery which provides tools for querying astronomical web forms and databases. This is not part of the regular AstroPy distribution and needs to be installed separately. Whereas many data archives have standardized VO interfaces to support data access (see Amelia Bayo's talk here), Astroquery mimics a web browser and provides access via an archive's form interface. This can be useful as not all provided information is necesarily available via the VO. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For example, the NASA Extragalactic Database is a very useful human-curated resource for extragalactic objects. However, a lot of the information that is available via the web pages is not available through an easy programmatic API. Let's say that we want to get the list of object types associated with a particulae source:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "No. Object Name ... Diameter Points Associations\n", " ... \n", "--- ---------------------------- ... --------------- ------------\n", " 1 GALEXASC J034513.29+382620.9 ... 0 0\n", " 2 2MASX J03451354+3825588 ... 2 0\n", " 3 6C B034157.1+381808 ... 0 1\n", " 4 4C +38.10 ... 0 1\n", " 5 GALEXASC J034517.39+382239.7 ... 0 0\n", " 6 GALEXASC J034517.60+382255.0 ... 0 0\n", " 7 GALEXASC J034518.64+382750.8 ... 0 0\n", " 8 GALEXASC J034522.97+382939.1 ... 0 0\n", " 9 GALEXASC J034526.22+382730.1 ... 0 0\n", " 10 GALEXASC J034529.64+382240.9 ... 0 0\n", " 11 GALEXASC J034530.57+382408.8 ... 0 0\n", " 12 GALEXASC J034531.71+382446.4 ... 0 0\n", " 13 GALEXASC J034532.20+382805.4 ... 0 0\n", " 14 2MASX J03453277+3824413 ... 2 0\n", " 15 GALEXASC J034536.14+382341.8 ... 0 0\n", " 16 GALEXASC J034540.66+382915.1 ... 0 0\n", " 17 GALEXASC J034540.66+382441.8 ... 0 0\n", " 18 GALEXASC J034544.51+382746.4 ... 0 0\n", " 19 GALEXASC J034544.63+382735.5 ... 0 0\n", " 20 GALEXASC J034545.40+382306.0 ... 0 0\n", " 21 GALEXASC J034550.35+382435.0 ... 0 0\n" ] }, { "data": { "text/plain": [ "{'G', 'RadioS', 'UvS'}" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from astroquery.ned import Ned\n", "from astropy.coordinates import SkyCoord\n", "from astropy import units as u\n", "\n", "co = SkyCoord(ra = 56.38, dec = 38.43, unit = (u.deg, u.deg))\n", "result = Ned.query_region(co, radius = 0.07 * u.deg)\n", "print result\n", "set(result.columns['Type'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Other libraries" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For some of the other lectures or projects this week, you might also need to install the following Python packages:\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Unrelated software use survey" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "https://goo.gl/W0jDMJ" ] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.13" } }, "nbformat": 4, "nbformat_minor": 1 }