{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# TUTORIAL 1: Introduction to Python\n",
    "\n",
    "\n",
    "## Google Colaboratory (Google CoLab)\n",
    "\n",
    "\n",
    "[Jupyter notebooks](https://jupyter.org/) are a standard tool for data scientists. They allow you to create and share documents that contain \"cells\" with runnable Python code as well as equations, visualizations, and text. [Google Colab](http://colab.research.google.com/) gives you the same ability online in a collaborative environment with all the resources of a powerful virtual machine underlying the notebook execution.\n",
    "\n",
    "Here is [an overview of Google Colaboratory (Google CoLab) features](https://colab.research.google.com/notebooks/basic_features_overview.ipynb) and a brief guide for [using BigQuery through Colaboratory](https://colab.research.google.com/notebooks/bigquery.ipynb). Before proceeding, make sure you have read and understood these support documents. To open a new notebook in [Colab](http://colab.research.google.com/), you can go to *File \\> Upload notebook* and choose the file either from your computer or from Google Drive. You can also make a copy of an existing Colab noteboook by going to *File \\> Save a Copy in Drive ...* . Colab notebooks can be saved just like any other file to your own Google Drive account.\n",
    "\n",
    "\n",
    "# INTRODUCTION TO PYTHON\n",
    "\n",
    "\n",
    "Welcome to our introduction to the Python Programming Language using our first iPython Notebook on Google CoLab!\n",
    "\n",
    "## Functions\n",
    "\n",
    "`add_numbers` is a function that takes two numbers and adds them together."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def add_numbers(x, y):\n",
    "    return x + y\n",
    "\n",
    "add_numbers(1, 2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`add_numbers` updated to take an optional 3rd parameter. Using `print` allows printing of multiple expressions within a single cell."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def add_numbers(x,y,z=None):\n",
    "    if (z==None):\n",
    "        return x+y\n",
    "    else:\n",
    "        return x+y+z\n",
    "\n",
    "print(add_numbers(1, 2))\n",
    "print(add_numbers(1, 2, 3))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`add_numbers` updated to take an optional flag parameter."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def add_numbers(x, y, z=None, flag=False):\n",
    "    if (flag):\n",
    "        print('Flag is true!')\n",
    "    if (z==None):\n",
    "        return x + y\n",
    "    else:\n",
    "        return x + y + z\n",
    "    \n",
    "print(add_numbers(1, 2, flag=True))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Assign function `add_numbers` to variable `a`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def add_numbers(x,y):\n",
    "    return x+y\n",
    "\n",
    "a = add_numbers\n",
    "a(1,2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Types and Sequences\n",
    "\n",
    "Use `type` to return the object's type."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "type('This is a string')\n",
    "type(None)\n",
    "type(1)\n",
    "type(1.0)\n",
    "type(add_numbers)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Tuples are an immutable data structure (cannot be altered)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "x = (1, 'a', 2, 'b')\n",
    "type(x)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Lists are a mutable data structure."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "x = [1, 'a', 2, 'b']\n",
    "type(x)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Use `append` to append an object to a list."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "x.append(3.3)\n",
    "print(x)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This is an example of how to loop through each item in the list."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "for item in x:\n",
    "    print(item)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Or using the indexing operator:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "i=0\n",
    "while( i != len(x) ):\n",
    "    print(x[i])\n",
    "    i = i + 1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Use `+` to concatenate lists."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "[1,2] + [3,4]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Use `*` to repeat lists."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "[1]*3"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Use the `in` operator to check if something is inside a list."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "1 in [1, 2, 3]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now let's look at strings. Use bracket notation to slice a string."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "x = 'This is a string'\n",
    "print(x[0]) #first character\n",
    "print(x[0:1]) #first character, but we have explicitly set the end character\n",
    "print(x[0:2]) #first two characters"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This will return the last element of the string."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "x[-1]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This will return the slice starting from the 4th element from the end and stopping before the 2nd element from the end."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "x[-4:-2]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This is a slice from the beginning of the string and stopping before the 3rd element."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "x[:3]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And this is a slice starting from the 4th element of the string and going all the way to the end."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "x[3:]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "firstname = 'Gary'\n",
    "lastname = 'Kildall'\n",
    "\n",
    "print(firstname + ' ' + lastname)\n",
    "print(firstname*3)\n",
    "print('Gary' in firstname)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`split` returns a list of all the words in a string, or a list split on a specific character."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "firstname = 'Gary Arlen Kildall'.split(' ')[0] # [0] selects the first element of the list\n",
    "lastname = 'Gary Arlen Kildall'.split(' ')[-1] # [-1] selects the last element of the list\n",
    "print(firstname)\n",
    "print(lastname)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Make sure you convert objects to strings before concatenating."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "'Gary' + 2\n",
    "'Gary' + str(2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Dictionaries associate keys with values."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "x = {'Gary Kildall': 'gkildall@digitalresearch.com', 'Bill Gates': 'billg@microsoft.com'}\n",
    "x['Gary Kildall'] # Retrieve a value by using the indexing operator"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "x['Steve Jobs'] = None\n",
    "x['Steve Jobs']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Iterate over all of the keys:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "for name in x:\n",
    "    print(x[name])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Iterate over all of the values:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "for email in x.values():\n",
    "    print(email)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Iterate over all of the items in the list:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "for name, email in x.items():\n",
    "    print(name)\n",
    "    print(email)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can unpack a sequence into different variables:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "x = ('Gary', 'Kildall', 'gkildall@digitalresearch.com')\n",
    "fname, lname, email = x\n",
    "fname\n",
    "lname"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Make sure the number of values you are unpacking matches the number of variables being assigned."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "x = ('Gary', 'Kildall', 'gkildall@digitalresearch.com', 'Digital Research')\n",
    "fname, lname, email = x"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# More on Strings"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print('Gary' + 2)\n",
    "print('Gary' + str(2))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Python has a built in method for convenient string formatting."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sales_record = {\n",
    "'price': 3.24,\n",
    "'num_items': 4,\n",
    "'person': 'Gary'}\n",
    "\n",
    "sales_statement = '{} bought {} item(s) at a price of {} each for a total of {}'\n",
    "\n",
    "print(sales_statement.format(sales_record['person'],\n",
    "                             sales_record['num_items'],\n",
    "                             sales_record['price'],\n",
    "                             sales_record['num_items']*sales_record['price']))\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Reading and Writing CSV files\n",
    "\n",
    "Let's import our datafile mpg.csv, which contains fuel economy data for 234 cars.\n",
    "\n",
    "* mpg : miles per gallon\n",
    "* class : car classification\n",
    "* cty : city mpg\n",
    "* cyl : # of cylinders\n",
    "* displ : engine displacement in liters\n",
    "* drv : f = front-wheel drive, r = rear wheel drive, 4 = 4wd\n",
    "* fl : fuel (e = ethanol E85, d = diesel, r = regular, p = premium, c = CNG)\n",
    "* hwy : highway mpg\n",
    "* manufacturer : automobile manufacturer\n",
    "* model : model of car\n",
    "* trans : type of transmission\n",
    "* year : model year"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import csv\n",
    "\n",
    "%precision 2\n",
    "\n",
    "with open('mpg.csv') as csvfile:\n",
    "    mpg = list(csv.DictReader(csvfile))\n",
    "    \n",
    "mpg[:3] # The first three dictionaries in our list."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`csv.Dictreader` has read in each row of our csv file as a dictionary. `len` shows that our list is comprised of 234 dictionaries."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "len(mpg)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`keys` gives us the column names of our csv."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "mpg[0].keys()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This is how to find the average cty fuel economy across all cars. All values in the dictionaries are strings, so we need to convert to float."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sum(float(d['cty']) for d in mpg) / len(mpg)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Similarly this is how to find the average hwy fuel economy across all cars."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sum(float(d['hwy']) for d in mpg) / len(mpg)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Use `set` to return the unique values for the number of cylinders the cars in our dataset have."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "cylinders = set(d['cyl'] for d in mpg)\n",
    "cylinders"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here's a more complex example where we are grouping the cars by number of cylinder, and finding the average cty mpg for each group."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "CtyMpgByCyl = []\n",
    "\n",
    "for c in cylinders: # iterate over all the cylinder levels\n",
    "    summpg = 0\n",
    "    cyltypecount = 0\n",
    "    for d in mpg: # iterate over all dictionaries\n",
    "        if d['cyl'] == c: # if the cylinder level type matches,\n",
    "            summpg += float(d['cty']) # add the cty mpg\n",
    "            cyltypecount += 1 # increment the count\n",
    "    CtyMpgByCyl.append((c, summpg / cyltypecount)) # append the tuple ('cylinder', 'avg mpg')\n",
    "\n",
    "CtyMpgByCyl.sort(key=lambda x: x[0])\n",
    "CtyMpgByCyl"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Use `set` to return the unique values for the class types in our dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "vehicleclass = set(d['class'] for d in mpg) # what are the class types\n",
    "vehicleclass"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And here's an example of how to find the average hwy mpg for each class of vehicle in our dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "HwyMpgByClass = []\n",
    "\n",
    "for t in vehicleclass: # iterate over all the vehicle classes\n",
    "    summpg = 0\n",
    "    vclasscount = 0\n",
    "    for d in mpg: # iterate over all dictionaries\n",
    "        if d['class'] == t: # if the class type matches,\n",
    "            summpg += float(d['hwy']) # add the hwy mpg\n",
    "            vclasscount += 1 # increment the count\n",
    "    HwyMpgByClass.append((t, summpg / vclasscount)) # append the tuple ('class', 'avg mpg')\n",
    "\n",
    "HwyMpgByClass.sort(key=lambda x: x[1])\n",
    "HwyMpgByClass"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Dates and Times"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import datetime as dt\n",
    "import time as tm"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`time` returns the current time in seconds since the Epoch. (January 1st, 1970)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "tm.time()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Convert the timestamp to datetime."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "dtnow = dt.datetime.fromtimestamp(tm.time())\n",
    "dtnow"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Handy datetime attributes:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "dtnow.year, dtnow.month, dtnow.day, dtnow.hour, dtnow.minute, dtnow.second # get year, month, day, etc.from a datetime"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`timedelta` is a duration expressing the difference between two dates."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "delta = dt.timedelta(days = 100) # create a timedelta of 100 days\n",
    "delta"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`date.today` returns the current local date."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "today = dt.date.today()\n",
    "today - delta # the date 100 days ago\n",
    "today > today-delta # compare dates"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Objects and map()\n",
    "\n",
    "An example of a class in python:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "class Person:\n",
    "    department = 'School of Information' #a class variable\n",
    "\n",
    "    def set_name(self, new_name): #a method\n",
    "        self.name = new_name\n",
    "    def set_location(self, new_location):\n",
    "        self.location = new_location"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "person = Person()\n",
    "person.set_name('Gary Kildall')\n",
    "person.set_location('Ann Arbor, MI, USA')\n",
    "print('{} live in {} and works in the department {}'.format(person.name, person.location, person.department))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here's an example of mapping the `min` function between two lists."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "store1 = [10.00, 11.00, 12.34, 2.34]\n",
    "store2 = [9.00, 11.10, 12.34, 2.01]\n",
    "cheapest = map(min, store1, store2)\n",
    "cheapest"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now let's iterate through the map object to see the values."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "for item in cheapest:\n",
    "    print(item)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Lambda and List Comprehensions\n",
    "\n",
    "Here's an example of lambda that takes in three parameters and adds the first two."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "my_function = lambda a, b, c : a + b\n",
    "my_function(1, 2, 3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's iterate from 0 to 999 and return the even numbers."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "my_list = []\n",
    "for number in range(0, 1000):\n",
    "    if number % 2 == 0:\n",
    "        my_list.append(number)\n",
    "my_list"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now the same thing but with list comprehension."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "my_list = [number for number in range(0,1000) if number % 2 == 0]\n",
    "my_list"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Numerical Python (NumPy)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Creating Arrays\n",
    "\n",
    "Create a list and convert it to a numpy array"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "mylist = [1, 2, 3]\n",
    "x = np.array(mylist)\n",
    "x"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Or just pass in a list directly"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "y = np.array([4, 5, 6])\n",
    "y"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Pass in a list of lists to create a multidimensional array."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "m = np.array([[7, 8, 9], [10, 11, 12]])\n",
    "m"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Use the shape method to find the dimensions of the array. (rows, columns)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "m.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`arange` returns evenly spaced values within a given interval."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "n = np.arange(0, 30, 2) # start at 0 count up by 2, stop before 30\n",
    "n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`reshape` returns an array with the same data with a new shape."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "n = n.reshape(3, 5) # reshape array to be 3x5\n",
    "n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`linspace` returns evenly spaced numbers over a specified interval."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "o = np.linspace(0, 4, 9) # return 9 evenly spaced values from 0 to 4\n",
    "o"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`resize` changes the shape and size of array in-place."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "o.resize(3, 3)\n",
    "o"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`ones` returns a new array of given shape and type, filled with ones."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "np.ones((3, 2))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`zeros` returns a new array of given shape and type, filled with zeros."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "np.zeros((2, 3))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`eye` returns a 2-D array with ones on the diagonal and zeros elsewhere."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "np.eye(3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`diag` extracts a diagonal or constructs a diagonal array."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "np.diag(y)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Create an array using repeating list (or see `np.tile`)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "np.array([1, 2, 3] * 3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Repeat elements of an array using `repeat`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "np.repeat([1, 2, 3], 3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Combining Arrays"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "p = np.ones([2, 3], int)\n",
    "p"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Use `vstack` to stack arrays in sequence vertically (row wise)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "np.vstack([p, 2*p])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Use `hstack` to stack arrays in sequence horizontally (column wise)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "np.hstack([p, 2*p])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Operations\n",
    "\n",
    "Use `+`, `-`, `*`, `/` and `**` to perform element wise addition, subtraction,\n",
    "multiplication, division and power."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(x + y) # elementwise addition     [1 2 3] + [4 5 6] = [5  7  9]\n",
    "print(x - y) # elementwise subtraction  [1 2 3] - [4 5 6] = [-3 -3 -3]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(x * y) # elementwise multiplication  [1 2 3] * [4 5 6] = [4  10  18]\n",
    "print(x / y) # elementwise divison         [1 2 3] / [4 5 6] = [0.25  0.4  0.5]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(x**2) # elementwise power  [1 2 3] ^2 =  [1 4 9]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Dot Product:**\n",
    "\n",
    "$ \\begin{bmatrix}x_1 \\ x_2 \\ x_3\\end{bmatrix}\n",
    "\\cdot\n",
    "\\begin{bmatrix}y_1 \\\\ y_2 \\\\ y_3\\end{bmatrix}\n",
    "= x_1 y_1 + x_2 y_2 + x_3 y_3$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "x.dot(y) # dot product  1*4 + 2*5 + 3*6"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "z = np.array([y, y**2])\n",
    "print(len(z)) # number of rows of array"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's look at transposing arrays. Transposing permutes the dimensions of the array."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "z = np.array([y, y**2])\n",
    "z"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The shape of array `z` is `(2,3)` before transposing."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "z.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Use `.T` to get the transpose."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "z.T"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The number of rows has swapped with the number of columns."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "z.T.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Use `.dtype` to see the data type of the elements in the array."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "z.dtype"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Use `.astype` to cast to a specific type."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "z = z.astype('f')\n",
    "z.dtype"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Math Functions\n",
    "\n",
    "Numpy has many built in math functions that can be performed on arrays."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "a = np.array([-4, -2, 1, 3, 5])\n",
    "a.sum()\n",
    "a.max()\n",
    "a.min()\n",
    "a.mean()\n",
    "a.std()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`argmax` and `argmin` return the index of the maximum and minimum values in the array."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "a.argmax()\n",
    "a.argmin()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Indexing / Slicing"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "s = np.arange(13)**2\n",
    "s"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Use bracket notation to get the value at a specific index. Remember that indexing starts at 0."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "s[0], s[4], s[-1]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Use `:` to indicate a range. `array[start:stop]`\n",
    "\n",
    "\n",
    "Leaving `start` or `stop` empty will default to the beginning/end of the array."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "s[1:5]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Use negatives to count from the back."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "s[-4:]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A second `:` can be used to indicate step-size. `array[start:stop:stepsize]`\n",
    "\n",
    "Here we are starting 5th element from the end, and counting backwards by 2 until the beginning of the array is reached."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "s[-5::-2]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's look at a multidimensional array."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "r = np.arange(36)\n",
    "r.resize((6, 6))\n",
    "r"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Use bracket notation to slice: `array[row, column]`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "r[2, 2]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And use : to select a range of rows or columns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "r[3, 3:6]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here we are selecting all the rows up to (and not including) row 2, and all the columns up to (and not including) the last column."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "r[:2, :-1]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This is a slice of the last row, and only every other element."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "r[-1, ::2]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can also perform conditional indexing. Here we are selecting values from the array that are greater than 30. (Also see `np.where`)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "r[r > 30]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here we are assigning all values in the array that are greater than 30 to the value of 30."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "r[r > 30] = 30\n",
    "r"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Copying Data\n",
    "\n",
    "Be careful with copying and modifying arrays in NumPy!\n",
    "\n",
    "\n",
    "`r2` is a slice of `r`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "r2 = r[:3,:3]\n",
    "r2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Set this slice's values to zero ([:] selects the entire array)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "r2[:] = 0\n",
    "r2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`r` has also been changed!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "r"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To avoid this, use `r.copy` to create a copy that will not affect the original array"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "r_copy = r.copy()\n",
    "r_copy"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now when r_copy is modified, r will not be changed."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "r_copy[:] = 10\n",
    "print(r_copy, '\\n')\n",
    "print(r)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Iterating Over Arrays\n",
    "\n",
    "Let's create a new 4 by 3 array of random numbers 0-9."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "test = np.random.randint(0, 10, (4,3))\n",
    "test"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Iterate by row:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "for row in test:\n",
    "    print(row)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Iterate by index:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "for i in range(len(test)):\n",
    "    print(test[i])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Iterate by row and index:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "for i, row in enumerate(test):\n",
    "    print('row', i, 'is', row)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Use `zip` to iterate over multiple iterables."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "test2 = test**2\n",
    "test2\n",
    "\n",
    "for i, j in zip(test, test2):\n",
    "    print(i,'+',j,'=',i+j)"
   ]
  }
 ],
 "metadata": {},
 "nbformat": 4,
 "nbformat_minor": 4
}