Introduction into python
for data science

Lecture 1 of python for data science course

All computing is about
ones and zeroes


>>> "{0:b}".format(42)
'101010'
        

Meaning:
42 = 1 × 25 + 0 × 24 + 1 × 23 + 0 × 22 + 1 × 21 + 0 × 20

All computing is about
ones and zeroes


>>> list(map(ord, "I🧡∞"))
[73, 129505, 8734]
        

Meaning:
letter "I" is encoded by number 73
emoji "🧡" is encoded by number 129505
symbol "∞" is encoded by number 8734

Python takes care of memory

  • memory is sequential, i.e. address #2 follows address #1
  • everything is of known length
  • if you have some data, it's somewhere in the memory

Python takes care of memory


>>> a = [1,2,3]
>>> b = [1,2,3]
>>> c = a
>>> object.__repr__(a)
'<list object at 0x100c6bf08>'
>>> object.__repr__(b)
'<list object at 0x100cad6c8>'
>>> object.__repr__(c)
'<list object at 0x100c6bf08>'
        

Meaning:
object "a" is stored at address 0x100c6bf08
object "b" is stored at address 0x100cad6c8
object "c" is stored at the same address as "a"

Python: programming language and a tool

What you downloaded is a python program to run your code written in python language

A program operates in the context of:

  • input (variable data)
  • output (data, usually depends on the input)
  • state (everything else, e.g. file system)

Python as a language

  • reads from top to bottom, from left to right
  • has sentences, called statements
  • has parts of speech
  • is hierarchical
  • can express same thing in lots of different ways
max_i, max = 0, array[0]
for i, el in enumerate(array[1:]):
    if el > max:
        max_i, max = i, el

Coding is a dialogue

Coding is a dialogue

Abstract syntax tree (AST)

1 + 2 * 3 != (1 + 2) * 3
max_i, max = 0, array[0]
for i, el in enumerate(array[1:]):
    if el > max:
        max_i, max = i, el

Abstract syntax tree (AST)

Each node does something to the state, and has an output.
Python computes bottom-up by nodes, replacing each node with its output.
So:

x = (7 + 2) * 5
is equivalent to:
x = (9) * 5

Python parts of speech: simple

1 reads “number 1” or “integer 1”
1.5 reads “number 1.5” or “float 1.5”
"1" or '1' reads “string "1"”
"""1""" reads “string "1"”
True reads “boolean true”
False reads “boolean true”
None reads just “none”

Python parts of speech: complex

(1,2) reads “tuple of numbers 1 and 2”
(1,) reads “tuple of number 1”
[1,2] reads “list of numbers 1 and 2”
{1,2} reads “set of numbers 1 and 2”
{1:2} reads “dictionary with key number 1 to value number 2”

Python expressions: references

x reads “value of object x”
x.y reads “value of attribute y of object x”
x(1,2) reads “call function x with parameters of numbers 1 and 2”
x.y() reads “call method y of object x with no parameters”

Python expressions: subscripts

x[1] reads “subscript object x with number 1” or more commonly “value of second element of list/tuple x”
x["a"] reads “subscript object x with string "a"” or more commonly “value by key "a" from dictionary x”
x[1:3] reads “subscript object x with slice from 1 to 3” or more commonly “take elements from 2nd to 4th of list/tuple x”

Python expressions: operators

Mathematical:
x + y, x - y, x / y, x * y
x ** y for power, x // y for integer division
x % y for modulus of division
Comparisons:
x > y, x >= y, x == y
Logical:
x and y, x or y
not x, x in y

Python sentence types, aka statemenets

Assignment:
x = 1
Conditions or if-statement:
if x > y:
    print("Greater")
elif x < y:
    print("Less")
else:
    print("Equal?")

Python sentence types, loops

For-loop:
for item in collection:
    print(item)
else:
    print("the end")

While loop:
while x > 0:
    x = x - 1
else:
    print("the end")

Python sentence types, misc.

Expecting errors:
try:
    maybe_works()
except Exception as e:
    print("well, it didn't")

Imports (using code from other files):
import antigravity
import pandas as pd
from matplotlib import pyplot as plt

Python sentence types, functions

You can define your function:
def index_of_max(array):
    max_i, max = 0, array[0]
    for i, el in enumerate(array[1:]):
        if el > max:
            max_i, max = i, el
    return max_i
And then call them:
print(index_of_max([1,2,3]))
print(index_of_max([]))

Errors

EVERYBODY MAKES THEM

Switch to Errors notebook

Data types

Lists: a collection of different objects

a = []
a.append(1)     # a = [1]
a.extend([2,3]) # a = [1,2,3]
a.reverse()     # a = [3,2,1]
a.index(1)      # 2 -------^
a.pop()         # a = [3,2]
del a[0]        # a = [2]

Data types

Dictionaries: a mapping between keys and values

a = {1:1, 2:4, 3:9, 4:16}
a[5] = 25              # a = {1:1, 2:4, 3:9, 4:16, 5:25}
a.extend({0:0, 1:-1})  # a = {1:-1, 2:4, 3:9, 4:16, 5:25, 0:0}
del a[4]               # a = {1:-1, 2:4, 3:9, 5:25, 0:0}
a.get(1)               # -1
a.pop(2)               # 4; a = {1:-1, 3:9, 5:25, 0:0}
list(a.keys())         # [1, 3, 5, 0]
list(a.values())       # [-1, 9, 25, 0]
list(a.items())        # [(1,-1),(3,9),(5,25),(0,0)]

Other people's code

  • distributed as packages (or libraries)
  • package has a name and a version
  • versions change all the time
  • manage packages with pip (or conda)
  • work in projects and manage each project's packages independently with pip-tools
    (or conda)

Most common packages:

  • numpy: for effective computation and linear algebra
  • pandas: for working with data in tables
  • matplotlib: for plotting data
  • scikit-learn: a collection of machine learning algorithms

GitHub

  • essential for collaboration
  • backup of your project
  • sharing your project
  • very useful still for personal projects