Introduction into pandas

Lecture 2 of python for data science course

Agenda

  • python recap
  • python omitted things
    • classes
    • list comprehensions
    • string literals
    • other operators
    • other statements
    • builtins
  • pandas introduction
    • DataFrame, Series, Index
    • selecting data

Python recap

Python as a language:
  • has parts of speech (numbers, strings, booleans…)
  • has hierarchical expressions
  • has sentence types (assignment, if, for, def…)

Important expressions

  • attribute reference: var.attr
  • indexing a variable with something:
    • e. g. number 3: var[3]
    • or string Aa: var["Aa"]
  • calling a function: func(1, "Bb")
  • attribute function is a method:
    var.func(False)

Data types

Lists: a collection of different objects

a = []
a.append(1)     # a = [1]
a.extend([2,3]) # a = [1,2,3]
a.reverse()     # a = [3,2,1]
a.index(1)      # 2 -------^
a.pop()         # a = [3,2]
del a[0]        # a = [2]

E.g. storing read books or published papers

Data types

Dictionaries: a mapping between keys and values

a = {1:1, 2:4, 3:9, 4:16}
a[5] = 25              # a = {1:1, 2:4, 3:9, 4:16, 5:25}
a.extend({0:0, 1:-1})  # a = {1:-1, 2:4, 3:9, 4:16, 5:25, 0:0}
del a[4]               # a = {1:-1, 2:4, 3:9, 5:25, 0:0}
a.get(1)               # -1
a.pop(2)               # 4; a = {1:-1, 3:9, 5:25, 0:0}
list(a.keys())         # [1, 3, 5, 0]
list(a.values())       # [-1, 9, 25, 0]
list(a.items())        # [(1,-1),(3,9),(5,25),(0,0)]

E.g. storing a phonebook

Lists vs Dictionaries

Lists:
  • sequential: always have order of elements
  • can have duplicates
Dictionaries:
  • order does not matter
  • cannot have duplicate keys: new value for a key will overwrite the previous

Classes

a = []
a.append(1)     # a = [1]
a.extend([2,3]) # a = [1,2,3]

What is the relationship between a and a List?

What is append or extend?

Classes: data and behaviour together

Author defines how data is stored and how it should be used: an interface

Example: checking out a book at a library

  • each book has an author and a title
  • you can check out a book only providing your library membership number

Class instances: data and behaviour together

User's data that follows the class author's standard

mybook = pick_random_book()
mybook.title           # "Robinson Crusoe"
mybook.author          # "Daniel Defoe"
mybook.checkout()      # Error: no account specified
mybook.checkout(1234)  # "Thank you for using our library"

Classes: data and behaviour together

Lists and dictionaries are built-in classes. Strings as well, and many others

To check which class (or type) is a variable:

>>> a = "hello"
>>> type(a)
<class 'str'>

Switch to pandas notebook