{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# TUTORIAL 1: Introduction to Python\n", "\n", "\n", "## Google Colaboratory (Google CoLab)\n", "\n", "\n", "[Jupyter notebooks](https://jupyter.org/) are a standard tool for data scientists. They allow you to create and share documents that contain \"cells\" with runnable Python code as well as equations, visualizations, and text. [Google Colab](http://colab.research.google.com/) gives you the same ability online in a collaborative environment with all the resources of a powerful virtual machine underlying the notebook execution.\n", "\n", "Here is [an overview of Google Colaboratory (Google CoLab) features](https://colab.research.google.com/notebooks/basic_features_overview.ipynb) and a brief guide for [using BigQuery through Colaboratory](https://colab.research.google.com/notebooks/bigquery.ipynb). Before proceeding, make sure you have read and understood these support documents. To open a new notebook in [Colab](http://colab.research.google.com/), you can go to *File \\> Upload notebook* and choose the file either from your computer or from Google Drive. You can also make a copy of an existing Colab noteboook by going to *File \\> Save a Copy in Drive ...* . Colab notebooks can be saved just like any other file to your own Google Drive account.\n", "\n", "\n", "# INTRODUCTION TO PYTHON\n", "\n", "\n", "Welcome to our introduction to the Python Programming Language using our first iPython Notebook on Google CoLab!\n", "\n", "## Functions\n", "\n", "`add_numbers` is a function that takes two numbers and adds them together." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def add_numbers(x, y):\n", " return x + y\n", "\n", "add_numbers(1, 2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`add_numbers` updated to take an optional 3rd parameter. Using `print` allows printing of multiple expressions within a single cell." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def add_numbers(x,y,z=None):\n", " if (z==None):\n", " return x+y\n", " else:\n", " return x+y+z\n", "\n", "print(add_numbers(1, 2))\n", "print(add_numbers(1, 2, 3))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`add_numbers` updated to take an optional flag parameter." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def add_numbers(x, y, z=None, flag=False):\n", " if (flag):\n", " print('Flag is true!')\n", " if (z==None):\n", " return x + y\n", " else:\n", " return x + y + z\n", " \n", "print(add_numbers(1, 2, flag=True))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Assign function `add_numbers` to variable `a`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def add_numbers(x,y):\n", " return x+y\n", "\n", "a = add_numbers\n", "a(1,2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Types and Sequences\n", "\n", "Use `type` to return the object's type." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "type('This is a string')\n", "type(None)\n", "type(1)\n", "type(1.0)\n", "type(add_numbers)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Tuples are an immutable data structure (cannot be altered)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x = (1, 'a', 2, 'b')\n", "type(x)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lists are a mutable data structure." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x = [1, 'a', 2, 'b']\n", "type(x)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Use `append` to append an object to a list." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x.append(3.3)\n", "print(x)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is an example of how to loop through each item in the list." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "for item in x:\n", " print(item)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Or using the indexing operator:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "i=0\n", "while( i != len(x) ):\n", " print(x[i])\n", " i = i + 1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Use `+` to concatenate lists." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "[1,2] + [3,4]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Use `*` to repeat lists." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "[1]*3" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Use the `in` operator to check if something is inside a list." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "1 in [1, 2, 3]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let's look at strings. Use bracket notation to slice a string." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x = 'This is a string'\n", "print(x[0]) #first character\n", "print(x[0:1]) #first character, but we have explicitly set the end character\n", "print(x[0:2]) #first two characters" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This will return the last element of the string." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x[-1]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This will return the slice starting from the 4th element from the end and stopping before the 2nd element from the end." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x[-4:-2]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is a slice from the beginning of the string and stopping before the 3rd element." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x[:3]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And this is a slice starting from the 4th element of the string and going all the way to the end." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x[3:]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "firstname = 'Gary'\n", "lastname = 'Kildall'\n", "\n", "print(firstname + ' ' + lastname)\n", "print(firstname*3)\n", "print('Gary' in firstname)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`split` returns a list of all the words in a string, or a list split on a specific character." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "firstname = 'Gary Arlen Kildall'.split(' ')[0] # [0] selects the first element of the list\n", "lastname = 'Gary Arlen Kildall'.split(' ')[-1] # [-1] selects the last element of the list\n", "print(firstname)\n", "print(lastname)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Make sure you convert objects to strings before concatenating." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "'Gary' + 2\n", "'Gary' + str(2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Dictionaries associate keys with values." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x = {'Gary Kildall': 'gkildall@digitalresearch.com', 'Bill Gates': 'billg@microsoft.com'}\n", "x['Gary Kildall'] # Retrieve a value by using the indexing operator" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x['Steve Jobs'] = None\n", "x['Steve Jobs']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Iterate over all of the keys:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "for name in x:\n", " print(x[name])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Iterate over all of the values:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "for email in x.values():\n", " print(email)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Iterate over all of the items in the list:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "for name, email in x.items():\n", " print(name)\n", " print(email)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can unpack a sequence into different variables:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x = ('Gary', 'Kildall', 'gkildall@digitalresearch.com')\n", "fname, lname, email = x\n", "fname\n", "lname" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Make sure the number of values you are unpacking matches the number of variables being assigned." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x = ('Gary', 'Kildall', 'gkildall@digitalresearch.com', 'Digital Research')\n", "fname, lname, email = x" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# More on Strings" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print('Gary' + 2)\n", "print('Gary' + str(2))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Python has a built in method for convenient string formatting." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sales_record = {\n", "'price': 3.24,\n", "'num_items': 4,\n", "'person': 'Gary'}\n", "\n", "sales_statement = '{} bought {} item(s) at a price of {} each for a total of {}'\n", "\n", "print(sales_statement.format(sales_record['person'],\n", " sales_record['num_items'],\n", " sales_record['price'],\n", " sales_record['num_items']*sales_record['price']))\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Reading and Writing CSV files\n", "\n", "Let's import our datafile mpg.csv, which contains fuel economy data for 234 cars.\n", "\n", "* mpg : miles per gallon\n", "* class : car classification\n", "* cty : city mpg\n", "* cyl : # of cylinders\n", "* displ : engine displacement in liters\n", "* drv : f = front-wheel drive, r = rear wheel drive, 4 = 4wd\n", "* fl : fuel (e = ethanol E85, d = diesel, r = regular, p = premium, c = CNG)\n", "* hwy : highway mpg\n", "* manufacturer : automobile manufacturer\n", "* model : model of car\n", "* trans : type of transmission\n", "* year : model year" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import csv\n", "\n", "%precision 2\n", "\n", "with open('mpg.csv') as csvfile:\n", " mpg = list(csv.DictReader(csvfile))\n", " \n", "mpg[:3] # The first three dictionaries in our list." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`csv.Dictreader` has read in each row of our csv file as a dictionary. `len` shows that our list is comprised of 234 dictionaries." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "len(mpg)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`keys` gives us the column names of our csv." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "mpg[0].keys()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is how to find the average cty fuel economy across all cars. All values in the dictionaries are strings, so we need to convert to float." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sum(float(d['cty']) for d in mpg) / len(mpg)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Similarly this is how to find the average hwy fuel economy across all cars." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sum(float(d['hwy']) for d in mpg) / len(mpg)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Use `set` to return the unique values for the number of cylinders the cars in our dataset have." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "cylinders = set(d['cyl'] for d in mpg)\n", "cylinders" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here's a more complex example where we are grouping the cars by number of cylinder, and finding the average cty mpg for each group." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "CtyMpgByCyl = []\n", "\n", "for c in cylinders: # iterate over all the cylinder levels\n", " summpg = 0\n", " cyltypecount = 0\n", " for d in mpg: # iterate over all dictionaries\n", " if d['cyl'] == c: # if the cylinder level type matches,\n", " summpg += float(d['cty']) # add the cty mpg\n", " cyltypecount += 1 # increment the count\n", " CtyMpgByCyl.append((c, summpg / cyltypecount)) # append the tuple ('cylinder', 'avg mpg')\n", "\n", "CtyMpgByCyl.sort(key=lambda x: x[0])\n", "CtyMpgByCyl" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Use `set` to return the unique values for the class types in our dataset." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "vehicleclass = set(d['class'] for d in mpg) # what are the class types\n", "vehicleclass" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And here's an example of how to find the average hwy mpg for each class of vehicle in our dataset." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "HwyMpgByClass = []\n", "\n", "for t in vehicleclass: # iterate over all the vehicle classes\n", " summpg = 0\n", " vclasscount = 0\n", " for d in mpg: # iterate over all dictionaries\n", " if d['class'] == t: # if the class type matches,\n", " summpg += float(d['hwy']) # add the hwy mpg\n", " vclasscount += 1 # increment the count\n", " HwyMpgByClass.append((t, summpg / vclasscount)) # append the tuple ('class', 'avg mpg')\n", "\n", "HwyMpgByClass.sort(key=lambda x: x[1])\n", "HwyMpgByClass" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Dates and Times" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import datetime as dt\n", "import time as tm" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`time` returns the current time in seconds since the Epoch. (January 1st, 1970)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "tm.time()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Convert the timestamp to datetime." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dtnow = dt.datetime.fromtimestamp(tm.time())\n", "dtnow" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Handy datetime attributes:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dtnow.year, dtnow.month, dtnow.day, dtnow.hour, dtnow.minute, dtnow.second # get year, month, day, etc.from a datetime" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`timedelta` is a duration expressing the difference between two dates." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "delta = dt.timedelta(days = 100) # create a timedelta of 100 days\n", "delta" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`date.today` returns the current local date." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "today = dt.date.today()\n", "today - delta # the date 100 days ago\n", "today > today-delta # compare dates" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Objects and map()\n", "\n", "An example of a class in python:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "class Person:\n", " department = 'School of Information' #a class variable\n", "\n", " def set_name(self, new_name): #a method\n", " self.name = new_name\n", " def set_location(self, new_location):\n", " self.location = new_location" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "person = Person()\n", "person.set_name('Gary Kildall')\n", "person.set_location('Ann Arbor, MI, USA')\n", "print('{} live in {} and works in the department {}'.format(person.name, person.location, person.department))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here's an example of mapping the `min` function between two lists." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "store1 = [10.00, 11.00, 12.34, 2.34]\n", "store2 = [9.00, 11.10, 12.34, 2.01]\n", "cheapest = map(min, store1, store2)\n", "cheapest" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let's iterate through the map object to see the values." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "for item in cheapest:\n", " print(item)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Lambda and List Comprehensions\n", "\n", "Here's an example of lambda that takes in three parameters and adds the first two." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "my_function = lambda a, b, c : a + b\n", "my_function(1, 2, 3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's iterate from 0 to 999 and return the even numbers." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "my_list = []\n", "for number in range(0, 1000):\n", " if number % 2 == 0:\n", " my_list.append(number)\n", "my_list" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now the same thing but with list comprehension." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "my_list = [number for number in range(0,1000) if number % 2 == 0]\n", "my_list" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Numerical Python (NumPy)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Creating Arrays\n", "\n", "Create a list and convert it to a numpy array" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "mylist = [1, 2, 3]\n", "x = np.array(mylist)\n", "x" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Or just pass in a list directly" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "y = np.array([4, 5, 6])\n", "y" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Pass in a list of lists to create a multidimensional array." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "m = np.array([[7, 8, 9], [10, 11, 12]])\n", "m" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Use the shape method to find the dimensions of the array. (rows, columns)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "m.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`arange` returns evenly spaced values within a given interval." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "n = np.arange(0, 30, 2) # start at 0 count up by 2, stop before 30\n", "n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`reshape` returns an array with the same data with a new shape." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "n = n.reshape(3, 5) # reshape array to be 3x5\n", "n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`linspace` returns evenly spaced numbers over a specified interval." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "o = np.linspace(0, 4, 9) # return 9 evenly spaced values from 0 to 4\n", "o" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`resize` changes the shape and size of array in-place." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "o.resize(3, 3)\n", "o" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`ones` returns a new array of given shape and type, filled with ones." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np.ones((3, 2))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`zeros` returns a new array of given shape and type, filled with zeros." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np.zeros((2, 3))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`eye` returns a 2-D array with ones on the diagonal and zeros elsewhere." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np.eye(3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`diag` extracts a diagonal or constructs a diagonal array." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np.diag(y)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Create an array using repeating list (or see `np.tile`)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np.array([1, 2, 3] * 3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Repeat elements of an array using `repeat`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np.repeat([1, 2, 3], 3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Combining Arrays" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "p = np.ones([2, 3], int)\n", "p" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Use `vstack` to stack arrays in sequence vertically (row wise)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np.vstack([p, 2*p])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Use `hstack` to stack arrays in sequence horizontally (column wise)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np.hstack([p, 2*p])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Operations\n", "\n", "Use `+`, `-`, `*`, `/` and `**` to perform element wise addition, subtraction,\n", "multiplication, division and power." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(x + y) # elementwise addition [1 2 3] + [4 5 6] = [5 7 9]\n", "print(x - y) # elementwise subtraction [1 2 3] - [4 5 6] = [-3 -3 -3]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(x * y) # elementwise multiplication [1 2 3] * [4 5 6] = [4 10 18]\n", "print(x / y) # elementwise divison [1 2 3] / [4 5 6] = [0.25 0.4 0.5]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(x**2) # elementwise power [1 2 3] ^2 = [1 4 9]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Dot Product:**\n", "\n", "$ \\begin{bmatrix}x_1 \\ x_2 \\ x_3\\end{bmatrix}\n", "\\cdot\n", "\\begin{bmatrix}y_1 \\\\ y_2 \\\\ y_3\\end{bmatrix}\n", "= x_1 y_1 + x_2 y_2 + x_3 y_3$" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x.dot(y) # dot product 1*4 + 2*5 + 3*6" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "z = np.array([y, y**2])\n", "print(len(z)) # number of rows of array" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's look at transposing arrays. Transposing permutes the dimensions of the array." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "z = np.array([y, y**2])\n", "z" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The shape of array `z` is `(2,3)` before transposing." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "z.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Use `.T` to get the transpose." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "z.T" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The number of rows has swapped with the number of columns." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "z.T.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Use `.dtype` to see the data type of the elements in the array." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "z.dtype" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Use `.astype` to cast to a specific type." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "z = z.astype('f')\n", "z.dtype" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Math Functions\n", "\n", "Numpy has many built in math functions that can be performed on arrays." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = np.array([-4, -2, 1, 3, 5])\n", "a.sum()\n", "a.max()\n", "a.min()\n", "a.mean()\n", "a.std()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`argmax` and `argmin` return the index of the maximum and minimum values in the array." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a.argmax()\n", "a.argmin()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Indexing / Slicing" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "s = np.arange(13)**2\n", "s" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Use bracket notation to get the value at a specific index. Remember that indexing starts at 0." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "s[0], s[4], s[-1]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Use `:` to indicate a range. `array[start:stop]`\n", "\n", "\n", "Leaving `start` or `stop` empty will default to the beginning/end of the array." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "s[1:5]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Use negatives to count from the back." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "s[-4:]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A second `:` can be used to indicate step-size. `array[start:stop:stepsize]`\n", "\n", "Here we are starting 5th element from the end, and counting backwards by 2 until the beginning of the array is reached." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "s[-5::-2]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's look at a multidimensional array." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "r = np.arange(36)\n", "r.resize((6, 6))\n", "r" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Use bracket notation to slice: `array[row, column]`" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "r[2, 2]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And use : to select a range of rows or columns" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "r[3, 3:6]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here we are selecting all the rows up to (and not including) row 2, and all the columns up to (and not including) the last column." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "r[:2, :-1]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is a slice of the last row, and only every other element." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "r[-1, ::2]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also perform conditional indexing. Here we are selecting values from the array that are greater than 30. (Also see `np.where`)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "r[r > 30]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here we are assigning all values in the array that are greater than 30 to the value of 30." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "r[r > 30] = 30\n", "r" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Copying Data\n", "\n", "Be careful with copying and modifying arrays in NumPy!\n", "\n", "\n", "`r2` is a slice of `r`" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "r2 = r[:3,:3]\n", "r2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Set this slice's values to zero ([:] selects the entire array)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "r2[:] = 0\n", "r2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`r` has also been changed!" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "r" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To avoid this, use `r.copy` to create a copy that will not affect the original array" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "r_copy = r.copy()\n", "r_copy" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now when r_copy is modified, r will not be changed." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "r_copy[:] = 10\n", "print(r_copy, '\\n')\n", "print(r)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Iterating Over Arrays\n", "\n", "Let's create a new 4 by 3 array of random numbers 0-9." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "test = np.random.randint(0, 10, (4,3))\n", "test" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Iterate by row:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "for row in test:\n", " print(row)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Iterate by index:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "for i in range(len(test)):\n", " print(test[i])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Iterate by row and index:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "for i, row in enumerate(test):\n", " print('row', i, 'is', row)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Use `zip` to iterate over multiple iterables." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "test2 = test**2\n", "test2\n", "\n", "for i, j in zip(test, test2):\n", " print(i,'+',j,'=',i+j)" ] } ], "metadata": {}, "nbformat": 4, "nbformat_minor": 4 }