Created by Nathan Kelber and Zhuo Chen under Creative Commons CC BY License
For questions/comments/improvements, email nathan.kelber@ithaka.org or zhuo.chen@ithaka.org.


Python Intermediate 1#

Description: This notebook describes:

  • What a list comprehension is

  • What a dictionary comprehension is

  • What a set comprehension is

Use Case: For Learners (Detailed explanation, not ideal for researchers)

Difficulty: Intermediate

Completion Time: 90 minutes

Knowledge Required:

Knowledge Recommended: None

Data Format: None

Libraries Used: NLTK

Research Pipeline: None


What is a Python Comprehension?#

Comprehensions in Python are constructs that allow us to build new sequences (such as lists, sets, dictionaries etc.) from sequences that are already defined. Python supports four types of comprehensions:

  • List comprehensions

  • Dictionary comprehensions

  • Set comprehensions

  • Generator comprehensions (to be presented in Python intermediate 5)

List comprehensions#

List comprehensions (Integers)#

Suppose you are a parent of a 4-month-old. You are taking record of your baby’s daily milk intake in ounces. Here is your record from last week.

# record of daily milk intake
milk = [24, 38, 40, 36, 42, 38, 36]

Your partner is French and is used to the metric system. Your partner wants to convert the numbers into mililiters.

1 ounce \(\approx\) 30 mililiters

Operator

Operation

Example

Evaluation

*

Multiplication

7 * 8

56

# Create a new list using a for loop

new_list = [] # An empty list we will add to

for number in milk:
    new_list.append(number * 30)

print(new_list)
[720, 1140, 1200, 1080, 1260, 1140, 1080]

Take a look again at the for loop.

for number in milk:
    new_list.append(number * 30)

We can read this as: for number in milk, append number multiplied by 30 to new_list.

Using a for loop to create a new list is familar to us. We have learned it in the Python Basics series.

Alternatively, we can also use a list comprehension to create a new list. Let’s rearrange the above for loop slightly to write a list comprehension.

# Create a new list of milk intake in mililiters

new_list = [number * 30 for number in milk] ## The brackets [] indicate we are creating a list

print(new_list)
[720, 1140, 1200, 1080, 1260, 1140, 1080]

We read this as: append number multiplied by 30, for number in milk.

If the order of the comprehension is confusing, it may help to skip the part before the for loop and start with: for number in milk then return to the beginning of the comprehension to see what will be appended: number * 30.

Up to this point, we have used two different ways to create a new list based on an old list:

  • using a regular for loop

  • using a list comprehension

I put them here side by side. The benefit of using a list comprehension is obvious. The syntax of a list comprehension is concise and short.


Add a Filtering Condition to Select Items to be Appended#

If we wanted, we could also add a filtering condition to the for loop.

Suppose the daily intake of milk for a 4-month-old as recommended by the AAP is 25 ounces at minimum.

# Create a list of the intake numbers that fall short of the recommended minimum quantity

new_list = [number for number in milk if number < 25]
print(new_list)
[24]
# Use len () to get how many days the newborn had less than the recommended minimum quantity

len(new_list)
1
# Write the list comprehension back to a for loop

new_list = [] 
for number in milk:
    if number < 25:
        new_list.append(number)
        
print(len(new_list))
1

List Comprehensions constructed from other iterables#

List comprehensions are used to create a list. We have seen examples where we use list comprehensions to create a new list based on an old list. Actually, list comprehensions can be used to create new lists based on any kind of iterables.

In Python, iterables are the objects whose members can be iterated over in a for loop. Objects like lists, tuples, sets, dictionaries, strings, etc. are called iterables.

Suppose you are practicing your basketball shooting skills.

# Create a new list containing your basketball shooting results 

shots = '10010' 

new_list = [int(item) for item in shots] ## string is an iterable

print(new_list)
[1, 0, 0, 1, 0]
# Use sum() to get how many times you have scored

sum(new_list)
2

Integers, however, are not iterable.

# Creating a new list based on an object that is not an iterable
# results in an error 

num = 12345

digits = [digit for digit in num if digit > 3] ## integer is not an iterable
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[10], line 6
      1 # Creating a new list based on an object that is not an iterable
      2 # results in an error 
      4 num = 12345
----> 6 digits = [digit for digit in num if digit > 3] ## integer is not an iterable

TypeError: 'int' object is not iterable

Coding Challenge! < / >

numbers = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Create a list odd_num which contains all the odd numbers from the list numbers. To find odd numbers, you can use the modulus % to see if there is a remainder of 1 after dividing a number by 2. If there is a remainder of 1, the number is odd.

Operator

Operation

Example

Evaluation

%

Modulus

5 % 2

1

5 % 2
1

See if you can write a list comprehension that creates a new list odd_num which only contains the odd numbers from numbers. The next code cell demonstrates how it could be done with a for loop.

# Creating a new list odd_num of numbers
odd_num = [] 
for number in numbers:
    if number % 2 == 1:
        odd_num.append(number) 

print(odd_num)
[1, 3, 5, 7, 9]
# Creating a new list odd_num from numbers
# Using a list comprehension

List Comprehensions (Strings)#

A list comprehension also works on a list containing other data types, such as strings.

# Create a list of people

people = ['Aaron Aston',
         'Brianna Barton',
         'Carla Cameron',
         'Delia Darcy',
         'Evelyn Elgin',
         'Frederick Federov',
         'Gaston Garbo']

Suppose you want to create a new list that only contains the first names.

# Create a new list that only includes first names
# Using a for loop

friends = [] 

for name in people:
    first_name = name.split()[0] # Split the name on whitespace, then grab the first name/item
    friends.append(first_name)
    
print(friends)
['Aaron', 'Brianna', 'Carla', 'Delia', 'Evelyn', 'Frederick', 'Gaston']

In this example, we split each name string on whitespace using the .split() method. This creates a list of strings from a string.

# Split a string on white space

"John Doe".split()
['John', 'Doe']
# Split a string on white space
# Then return only the first item in the list

"John Doe".split()[0]
'John'

Coding Challenge! < / >

Use a list comprehension to create a list called friends that contains only first names.

# Create a new list that only includes first names
# Using a list comprehension

List Comprehensions (Multiple Lists)#

We can also create a list comprehension that pulls from multiple lists by using two for loops within a single list comprehension.

Scenario: Suppose you are running a restaurant. For the lunch special, you provide different varieties of rice and different protein choices that go with the rice.

# Define two lists: rices and proteins

rices = ["white rice", "brown rice", "yellow rice"]

proteins = ["beef", "pork", "chicken", "shrimp", "lamb", "tofu"] 
# A Nested For Loop Example
all_lunch_special_choices = []

for rice in rices:
    for protein in proteins:
        all_lunch_special_choices.append(rice + " with " + protein)

from pprint import pprint 
pprint(all_lunch_special_choices) # use pprint to print the output in a pretty format
['white rice with beef',
 'white rice with pork',
 'white rice with chicken',
 'white rice with shrimp',
 'white rice with lamb',
 'white rice with tofu',
 'brown rice with beef',
 'brown rice with pork',
 'brown rice with chicken',
 'brown rice with shrimp',
 'brown rice with lamb',
 'brown rice with tofu',
 'yellow rice with beef',
 'yellow rice with pork',
 'yellow rice with chicken',
 'yellow rice with shrimp',
 'yellow rice with lamb',
 'yellow rice with tofu']
# Using a list comprehension on two lists
# Create a list of all possible combinations of rice and protein

all_lunch_special_choices = [rice + " with " + protein for rice in rices for protein in proteins]

pprint(all_lunch_special_choices)
['white rice with beef',
 'white rice with pork',
 'white rice with chicken',
 'white rice with shrimp',
 'white rice with lamb',
 'white rice with tofu',
 'brown rice with beef',
 'brown rice with pork',
 'brown rice with chicken',
 'brown rice with shrimp',
 'brown rice with lamb',
 'brown rice with tofu',
 'yellow rice with beef',
 'yellow rice with pork',
 'yellow rice with chicken',
 'yellow rice with shrimp',
 'yellow rice with lamb',
 'yellow rice with tofu']

The two lists we pull from are independent of each other. You can see that even if we switch the two for loops, the result is still a valid list comprehension.

# Using a list comprehension on two lists
# Create a list of all possible combinations of protein and rice

all_lunch_special_choices = [rice + " with " + protein for protein in proteins for rice in rices]

pprint(all_lunch_special_choices)
['white rice with beef',
 'brown rice with beef',
 'yellow rice with beef',
 'white rice with pork',
 'brown rice with pork',
 'yellow rice with pork',
 'white rice with chicken',
 'brown rice with chicken',
 'yellow rice with chicken',
 'white rice with shrimp',
 'brown rice with shrimp',
 'yellow rice with shrimp',
 'white rice with lamb',
 'brown rice with lamb',
 'yellow rice with lamb',
 'white rice with tofu',
 'brown rice with tofu',
 'yellow rice with tofu']

What if the lists we pull from are not independent of one another? What if one is nested in another? Can we switch the two for loops?

# Create a list of all names from nested lists

names = [
    ['Abby', 'Bella', 'Cecilia'],
    ['Alex', 'Beatrice', 'Cynthia', 'David']
]

all_names = [name for sub_list in names for name in sub_list]
print(all_names)
['Abby', 'Bella', 'Cecilia', 'Alex', 'Beatrice', 'Cynthia', 'David']
# Switch the two for loops and see what happens 

names = [
    ['Abby', 'Bella','Cecilia'],
    ['Alex','Beatrice','Cynthia','David']
]

all_names = [name for name in sublist for sublist in names]
print(all_names)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[26], line 8
      1 # Switch the two for loops and see what happens 
      3 names = [
      4     ['Abby', 'Bella','Cecilia'],
      5     ['Alex','Beatrice','Cynthia','David']
      6 ]
----> 8 all_names = [name for name in sublist for sublist in names]
      9 print(all_names)

NameError: name 'sublist' is not defined
# Convert the list comprehension back to a for loop to get a clearer view

names = [
    ['Abby', 'Bella','Cecilia'],
    ['Alex','Beatrice','Cynthia','David']
]

all_names = []
for name in sublist:
    for sublist in names:
        all_names.append(name)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[33], line 9
      3 names = [
      4     ['Abby', 'Bella','Cecilia'],
      5     ['Alex','Beatrice','Cynthia','David']
      6 ]
      8 all_names = []
----> 9 for name in sublist:
     10     for sublist in names:
     11         all_names.append(name)

NameError: name 'sublist' is not defined

Coding Challenge! < / >

Process the names list using a for loop, and store the names that start with the letter ‘A’ in a new list a_name.

Then do the same thing using a list comprehension.

# Create a list that only contains the names that start with the letter 'A'
# Using a for loop
names = [
    ['Abby', 'Bella','Cecilia'],
    ['Alex','Beatrice','Cynthia','David']
]
# Create a list that only contains the names that start with the letter 'A'
# Using a list comprehension

Dictionary Comprehension#

The form of a dictionary comprehension is the same as for a list. Since a dictionary comprehension may deal with keys, values, or both, we need to be prepared to use .keys(), .values(), or .items() (for both).

# Create a dictionary of contacts and occupations

contacts = {
 'Amanda Bennett': 'Engineer, electrical',
 'Bryan Miller': 'Radiation protection practitioner',
 'Christopher Garrison': 'Planning and development surveyor',
 'Debra Allen': 'Intelligence analyst',
 'Donna Decker': 'Architect',
 'Heather Bullock': 'Media planner',
 'Jason Brown': 'Energy manager',
 'Jason Soto': 'Lighting technician, broadcasting/film/video',
 'Marissa Munoz': 'Further education lecturer',
 'Matthew Mccall': 'Chief Technology Officer',
 'Michael Norman': 'Translator',
 'Nicole Leblanc': 'Financial controller',
 'Noah Delgado': 'Engineer, civil',
 'Rachel Charles': 'Physicist, medical',
 'Stephanie Petty': 'Architect'}

When we loop over a dictionary, we will only loop over the keys of the dictionary.

# Looping over a dictionary only loops the keys
for contact in contacts:
    print(contact)
Amanda Bennett
Bryan Miller
Christopher Garrison
Debra Allen
Donna Decker
Heather Bullock
Jason Brown
Jason Soto
Marissa Munoz
Matthew Mccall
Michael Norman
Nicole Leblanc
Noah Delgado
Rachel Charles
Stephanie Petty
# Looping over a dictionary by specifying .keys()
for key in contacts.keys():
    print(key)
Amanda Bennett
Bryan Miller
Christopher Garrison
Debra Allen
Donna Decker
Heather Bullock
Jason Brown
Jason Soto
Marissa Munoz
Matthew Mccall
Michael Norman
Nicole Leblanc
Noah Delgado
Rachel Charles
Stephanie Petty

To loop over both the keys and the values, we will need to use dict.items().

# Looping over a dictionary by specifying .items()
for item in contacts.items():
    print(item)
('Amanda Bennett', 'Engineer, electrical')
('Bryan Miller', 'Radiation protection practitioner')
('Christopher Garrison', 'Planning and development surveyor')
('Debra Allen', 'Intelligence analyst')
('Donna Decker', 'Architect')
('Heather Bullock', 'Media planner')
('Jason Brown', 'Energy manager')
('Jason Soto', 'Lighting technician, broadcasting/film/video')
('Marissa Munoz', 'Further education lecturer')
('Matthew Mccall', 'Chief Technology Officer')
('Michael Norman', 'Translator')
('Nicole Leblanc', 'Financial controller')
('Noah Delgado', 'Engineer, civil')
('Rachel Charles', 'Physicist, medical')
('Stephanie Petty', 'Architect')

Note that each key/value pair is returned as a tuple. A tuple is very similar to a Python list; the difference is that a tuple cannot be modified. The technical term in Python is immutable.

  • A list is mutable (can be changed)

  • A tuple is immutable (cannot be changed)

We can further distinguish between them by the fact that:

  • A list uses hard brackets []

  • A tuple uses parentheses ().

We can create a new dictionary from the original dictionary using a for loop to iterate through the key/value pairs. The for loop format is similar to a list except we need to use an index to refer to the key or value of the tuple.

# Create a new dictionary that only contains the engineers 
# Using a for loop

engineer_contacts = {}

for contact in contacts.items():
    if 'Engineer' in contact[1]:
        engineer_contacts[contact[0]] = contact[1]
    
pprint(engineer_contacts)
{'Amanda Bennett': 'Engineer, electrical', 'Noah Delgado': 'Engineer, civil'}
# Use index to access the elements in a tuple

person = ("John Doe", "Engineer, mechanical")
person[1]
'Engineer, mechanical'
# A quick reminder of how to add key/value pairs to a dictionary

grades = {'John': 90, 'Mary': 95}
grades['Sue'] = 98
print(grades)
{'John': 90, 'Mary': 95, 'Sue': 98}
# Use a dictionary comprehension to iterate through the (key, value) tuples of the items in a dictionary
# Add each key:value pair to a new dictionary engineer_contacts

engineer_contacts = {contact[0]:contact[1] for contact in contacts.items() if 'Engineer' in contact[1]}
pprint(engineer_contacts)
{'Amanda Bennett': 'Engineer, electrical', 'Noah Delgado': 'Engineer, civil'}

Instead of using indices with each tuple, we can also give variable names to the keys and values respectively.

# Use key/value variable names for each tuple
# For loop example

engineer_contacts = {} 

for (name, occupation) in contacts.items():
    if 'Engineer' in occupation:
        engineer_contacts[name] = occupation

pprint(engineer_contacts)      
{'Amanda Bennett': 'Engineer, electrical', 'Noah Delgado': 'Engineer, civil'}

Note that when we assign the keys and values to the two variables name and occupation, we use parentheses to indicate that each pair is a tuple. However, the parentheses are not obligatory. You can remove them and the code will still work.

# Using key/value variable names for each tuple
# Dictionary comprehension example

engineer_contacts = {name : occupation for (name, occupation) in contacts.items() if 'Engineer' in occupation}
pprint(engineer_contacts)
{'Amanda Bennett': 'Engineer, electrical', 'Noah Delgado': 'Engineer, civil'}

In the section on list comrehensions, we saw that we can use list comprehensions to create a list from any kind of iterables. The same is true for dictionary comprehensions. We can use dictionary comprehensions to create a new dictionary based on any kind of iterables, not necessarily an old dictionary.

# Create a dictionary based on a list of word strings
# where keys are the words and values are the lengths of the words
# for loop example

words = ['more', 'is', 'said', 'than', 'done']

word_length = {}

for word in words:
    word_length[word] = len(word)
    
print(word_length)  
{'more': 4, 'is': 2, 'said': 4, 'than': 4, 'done': 4}
# Create a dictionary of word/word length pairs based on a list of word strings
# using a dictionary comprehension

word_length = {word : len(word) for word in words}
print(word_length)
{'more': 4, 'is': 2, 'said': 4, 'than': 4, 'done': 4}

Coding Challenge! < / >

Suppose you are a grocery store owner. Due to the inflation, you have to raise prices by 15%. In store_prices are the items and their original price. Use a dictionary comprehension to create a new dictionary with the new price.

Hint: You can round a number to two decimal places using the round() function. The first argument is the number to be rounded; the second argument is the level of precision. In this case, two decimal places.

round(3.1415926, 2)
3.14
# Create a new dictionary where all prices are 15% higher
store_prices = {
    "milk": 3.49,
    "egg": 5.29,
    "bread": 2.99,
    "spinach": 1.99,
    "lettuce": 2.35,
    "banana": 0.99
}

Set comprehension#

Sets in Python are written with curly braces. Curly braces {} are used for both dictionaries and sets in Python. Which one is created depends on whether we supply the associated value or not. We can use the type() function to discover what kind of object a variable is.

# Demonstrating a set
# One data entry per comma in curly braces

test_set = {1, 2, 3}
type(test_set)
set
# Demonstrating a dictionary
# Two data entries separated by a colon per each comma in curly braces

test_dict = {1 : 'apple', 2 : 'banana', 3 : 'cherry'}
type(test_dict)
dict

To create an empty set, we use the set() function. By default, empty curly braces will create an empty dictionary.

# Demonstrating creation of empty dict vs empty set

test_set = set()
test_dict = {}

print(f'test_set is a {type(test_set)}')
print(f'test_dict is a {type(test_dict)}')
test_set is a <class 'set'>
test_dict is a <class 'dict'>
# Using a for loop with a set

set1 = {5, 6, 7, 8, 9}
set2 = set() ## note how we initialize an empty set
for num in set1:
    if num > 5:
        set2.add(num) # note how we add a new element to a set
print(set2)
{8, 9, 6, 7}
# Using a set comprehension

set2 = {num for num in set1 if num > 5}
print(set2) 
{8, 9, 6, 7}

A set is an unordered collection of distinct objects. If you change the order of the elements or list an element more than once, that does not change the set.

# Using a comparison operator on two sets
# Same elements in different order

{1,2} == {2,1}
True
# Using a comparison operator on two sets
# Repeated elements in a set

{1,1,2} == {1,2}
True
# Printing a set with duplicates
# Duplicates are removed automatically

print({1, 1, 2})
{1, 2}

Again, we can use set comprehensions to create a new set based on any kind of iterables that have been defined.

# Create a new set containing only the names from the dictionary of contacts

names = {name for name in contacts}

pprint(names)
{'Amanda Bennett',
 'Bryan Miller',
 'Christopher Garrison',
 'Debra Allen',
 'Donna Decker',
 'Heather Bullock',
 'Jason Brown',
 'Jason Soto',
 'Marissa Munoz',
 'Matthew Mccall',
 'Michael Norman',
 'Nicole Leblanc',
 'Noah Delgado',
 'Rachel Charles',
 'Stephanie Petty'}

Coding Challenge! < / >

An illustration of a wordle game

In the game Wordle, players must guess a five letter word in six guesses or less. A player is told if the letters in their guess are found in the word and if they are in the correct spot for the answer.

On their first guess, a player discovers that the 3rd letter is “I” and the 4th letter is “S”. If they have the set of all possible words, they could narrow down their guesses.

Assume the words set contains a set of all possible words, can you write a set comprehension that will generate a set of all possible solutions?

As an extra challenge, write additional set comprehensions to eliminate answers that contain the letters “P”, “R”, or “M”.

# Write a set comprehension that creates a set of potential answers

words = {'carbon',
         'monkey',
         'rabbit',
         'theory',
         'grist',
         'farmer',
         'pillow',
         'exist',
         'frisk',
         'harbor',
         'prism'
        }

Lesson Complete#

Congratulations! You have completed Python Intermediate 1.

Start Next Lesson: Python Intermediate 2#

Exercise Solutions#

Here are a few solutions for exercises in this lesson.

# Creating a new list odd_num from odd numbers
# Using a list comprehension

numbers = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

odd_num = [number for number in numbers if number % 2 == 1]
print(odd_num)
[1, 3, 5, 7, 9]
# Create a new list that only includes first names
# Using a list comprehension

people = ['Aaron Aston',
         'Brianna Barton',
         'Carla Cameron',
         'Delia Darcy',
         'Evelyn Elgin',
         'Frederick Federov',
         'Gaston Garbo']

friends = [name.split()[0] for name in people]

print(friends)
['Aaron', 'Brianna', 'Carla', 'Delia', 'Evelyn', 'Frederick', 'Gaston']
# Create a list that only contains the names that start with the letter 'A'
# Using a for loop
names = [
    ['Abby', 'Bella','Cecilia'],
    ['Alex','Beatrice','Cynthia','David']
]

a_name = []
for sublist in names:
    for name in sublist:
        if name[0] == 'A':
            a_name.append(name)
print(a_name)
['Abby', 'Alex']
# Create a list that only contains the names that start with the letter 'A'
# Using a list comprehension

a_name = [name for sublist in names for name in sublist if name[0] == 'A']
print(a_name)
['Abby', 'Alex']
# Create a new dictionary where all prices are 15% higher
store_prices = {
    "milk": 3.49,
    "egg": 5.29,
    "bread": 2.99,
    "spinach": 1.99,
    "lettuce": 2.35,
    "banana": 0.99
}

new_prices = {item : round(price * 1.15, 2) for (item, price) in store_prices.items()}
pprint(new_prices)
{'banana': 1.14,
 'bread': 3.44,
 'egg': 6.08,
 'lettuce': 2.7,
 'milk': 4.01,
 'spinach': 2.29}
# Write a set comprehension that creates a set of potential answers

words = {'carbon',
         'monkey',
         'rabbit',
         'theory',
         'grist',
         'farmer',
         'pillow',
         'exist',
         'frisk',
         'harbor',
         'prism'
        }

answers = {word for word in words if word[2] == 'i' and word[3] == 's'}         
print(answers)
{'exist', 'grist', 'frisk', 'prism'}
# Write additional set comprehensions to eliminate answers that contain the letters "P", "R", or "M"

answers = {word for word in answers if 'p' not in word and 'r' not in word and 'm' not in word}
print(answers)
{'exist'}