Created by Nathan Kelber and Zhuo Chen under Creative Commons CC BY License
For questions/comments/improvements, email nathan.kelber@ithaka.org or zhuo.chen@ithaka.org.
Python Intermediate 1#
Description: This notebook describes:
What a list comprehension is
What a dictionary comprehension is
What a set comprehension is
Use Case: For Learners (Detailed explanation, not ideal for researchers)
Difficulty: Intermediate
Completion Time: 90 minutes
Knowledge Required:
Python Basics Series (Start Python Basics 1)
Knowledge Recommended: None
Data Format: None
Libraries Used: NLTK
Research Pipeline: None
What is a Python Comprehension?#
Comprehensions in Python are constructs that allow us to build new sequences (such as lists, sets, dictionaries etc.) from sequences that are already defined. Python supports four types of comprehensions:
List comprehensions
Dictionary comprehensions
Set comprehensions
Generator comprehensions (to be presented in Python intermediate 5)
List comprehensions#
List comprehensions (Integers)#
Suppose you are a parent of a 4-month-old. You are taking record of your baby’s daily milk intake in ounces. Here is your record from last week.
# record of daily milk intake
milk = [24, 38, 40, 36, 42, 38, 36]
Your partner is French and is used to the metric system. Your partner wants to convert the numbers into mililiters.
1 ounce \(\approx\) 30 mililiters
Operator |
Operation |
Example |
Evaluation |
---|---|---|---|
* |
Multiplication |
7 * 8 |
56 |
# Create a new list using a for loop
new_list = [] # An empty list we will add to
for number in milk:
new_list.append(number * 30)
print(new_list)
[720, 1140, 1200, 1080, 1260, 1140, 1080]
Take a look again at the for loop.
for number in milk:
new_list.append(number * 30)
We can read this as: for number
in milk
, append number
multiplied by 30 to new_list
.
Using a for loop to create a new list is familar to us. We have learned it in the Python Basics series.
Alternatively, we can also use a list comprehension to create a new list. Let’s rearrange the above for loop slightly to write a list comprehension.
# Create a new list of milk intake in mililiters
new_list = [number * 30 for number in milk] ## The brackets [] indicate we are creating a list
print(new_list)
[720, 1140, 1200, 1080, 1260, 1140, 1080]
We read this as: append number
multiplied by 30, for number
in milk
.
If the order of the comprehension is confusing, it may help to skip the part before the for loop and start with:
for number in milk
then return to the beginning of the comprehension to see what will be appended: number
* 30.
Up to this point, we have used two different ways to create a new list based on an old list:
using a regular for loop
using a list comprehension
I put them here side by side. The benefit of using a list comprehension is obvious. The syntax of a list comprehension is concise and short.


Add a Filtering Condition to Select Items to be Appended#
If we wanted, we could also add a filtering condition to the for loop.
Suppose the daily intake of milk for a 4-month-old as recommended by the AAP is 25 ounces at minimum.
# Create a list of the intake numbers that fall short of the recommended minimum quantity
new_list = [number for number in milk if number < 25]
print(new_list)
[24]
# Use len () to get how many days the newborn had less than the recommended minimum quantity
len(new_list)
1
# Write the list comprehension back to a for loop
new_list = []
for number in milk:
if number < 25:
new_list.append(number)
print(len(new_list))
1
List Comprehensions constructed from other iterables#
List comprehensions are used to create a list. We have seen examples where we use list comprehensions to create a new list based on an old list. Actually, list comprehensions can be used to create new lists based on any kind of iterables.
In Python, iterables are the objects whose members can be iterated over in a for loop. Objects like lists, tuples, sets, dictionaries, strings, etc. are called iterables.
Suppose you are practicing your basketball shooting skills.
# Create a new list containing your basketball shooting results
shots = '10010'
new_list = [int(item) for item in shots] ## string is an iterable
print(new_list)
[1, 0, 0, 1, 0]
# Use sum() to get how many times you have scored
sum(new_list)
2
Integers, however, are not iterable.
# Creating a new list based on an object that is not an iterable
# results in an error
num = 12345
digits = [digit for digit in num if digit > 3] ## integer is not an iterable
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[10], line 6
1 # Creating a new list based on an object that is not an iterable
2 # results in an error
4 num = 12345
----> 6 digits = [digit for digit in num if digit > 3] ## integer is not an iterable
TypeError: 'int' object is not iterable
Coding Challenge! < / >
numbers = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Create a list odd_num
which contains all the odd numbers from the list numbers
. To find odd numbers, you can use the modulus %
to see if there is a remainder of 1 after dividing a number by 2. If there is a remainder of 1, the number is odd.
Operator |
Operation |
Example |
Evaluation |
---|---|---|---|
% |
Modulus |
5 % 2 |
1 |
5 % 2
1
See if you can write a list comprehension that creates a new list odd_num
which only contains the odd numbers from numbers
. The next code cell demonstrates how it could be done with a for loop.
# Creating a new list odd_num of numbers
odd_num = []
for number in numbers:
if number % 2 == 1:
odd_num.append(number)
print(odd_num)
[1, 3, 5, 7, 9]
# Creating a new list odd_num from numbers
# Using a list comprehension
List Comprehensions (Strings)#
A list comprehension also works on a list containing other data types, such as strings.
# Create a list of people
people = ['Aaron Aston',
'Brianna Barton',
'Carla Cameron',
'Delia Darcy',
'Evelyn Elgin',
'Frederick Federov',
'Gaston Garbo']
Suppose you want to create a new list that only contains the first names.
# Create a new list that only includes first names
# Using a for loop
friends = []
for name in people:
first_name = name.split()[0] # Split the name on whitespace, then grab the first name/item
friends.append(first_name)
print(friends)
['Aaron', 'Brianna', 'Carla', 'Delia', 'Evelyn', 'Frederick', 'Gaston']
In this example, we split each name string on whitespace using the .split()
method. This creates a list of strings from a string.
# Split a string on white space
"John Doe".split()
['John', 'Doe']
# Split a string on white space
# Then return only the first item in the list
"John Doe".split()[0]
'John'
Coding Challenge! < / >
Use a list comprehension to create a list called friends
that contains only first names.
# Create a new list that only includes first names
# Using a list comprehension
List Comprehensions (Multiple Lists)#
We can also create a list comprehension that pulls from multiple lists by using two for loops within a single list comprehension.
Scenario: Suppose you are running a restaurant. For the lunch special, you provide different varieties of rice and different protein choices that go with the rice.
# Define two lists: rices and proteins
rices = ["white rice", "brown rice", "yellow rice"]
proteins = ["beef", "pork", "chicken", "shrimp", "lamb", "tofu"]
# A Nested For Loop Example
all_lunch_special_choices = []
for rice in rices:
for protein in proteins:
all_lunch_special_choices.append(rice + " with " + protein)
from pprint import pprint
pprint(all_lunch_special_choices) # use pprint to print the output in a pretty format
['white rice with beef',
'white rice with pork',
'white rice with chicken',
'white rice with shrimp',
'white rice with lamb',
'white rice with tofu',
'brown rice with beef',
'brown rice with pork',
'brown rice with chicken',
'brown rice with shrimp',
'brown rice with lamb',
'brown rice with tofu',
'yellow rice with beef',
'yellow rice with pork',
'yellow rice with chicken',
'yellow rice with shrimp',
'yellow rice with lamb',
'yellow rice with tofu']
# Using a list comprehension on two lists
# Create a list of all possible combinations of rice and protein
all_lunch_special_choices = [rice + " with " + protein for rice in rices for protein in proteins]
pprint(all_lunch_special_choices)
['white rice with beef',
'white rice with pork',
'white rice with chicken',
'white rice with shrimp',
'white rice with lamb',
'white rice with tofu',
'brown rice with beef',
'brown rice with pork',
'brown rice with chicken',
'brown rice with shrimp',
'brown rice with lamb',
'brown rice with tofu',
'yellow rice with beef',
'yellow rice with pork',
'yellow rice with chicken',
'yellow rice with shrimp',
'yellow rice with lamb',
'yellow rice with tofu']
The two lists we pull from are independent of each other. You can see that even if we switch the two for loops, the result is still a valid list comprehension.
# Using a list comprehension on two lists
# Create a list of all possible combinations of protein and rice
all_lunch_special_choices = [rice + " with " + protein for protein in proteins for rice in rices]
pprint(all_lunch_special_choices)
['white rice with beef',
'brown rice with beef',
'yellow rice with beef',
'white rice with pork',
'brown rice with pork',
'yellow rice with pork',
'white rice with chicken',
'brown rice with chicken',
'yellow rice with chicken',
'white rice with shrimp',
'brown rice with shrimp',
'yellow rice with shrimp',
'white rice with lamb',
'brown rice with lamb',
'yellow rice with lamb',
'white rice with tofu',
'brown rice with tofu',
'yellow rice with tofu']
What if the lists we pull from are not independent of one another? What if one is nested in another? Can we switch the two for loops?
# Create a list of all names from nested lists
names = [
['Abby', 'Bella', 'Cecilia'],
['Alex', 'Beatrice', 'Cynthia', 'David']
]
all_names = [name for sub_list in names for name in sub_list]
print(all_names)
['Abby', 'Bella', 'Cecilia', 'Alex', 'Beatrice', 'Cynthia', 'David']
# Switch the two for loops and see what happens
names = [
['Abby', 'Bella','Cecilia'],
['Alex','Beatrice','Cynthia','David']
]
all_names = [name for name in sublist for sublist in names]
print(all_names)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[26], line 8
1 # Switch the two for loops and see what happens
3 names = [
4 ['Abby', 'Bella','Cecilia'],
5 ['Alex','Beatrice','Cynthia','David']
6 ]
----> 8 all_names = [name for name in sublist for sublist in names]
9 print(all_names)
NameError: name 'sublist' is not defined
# Convert the list comprehension back to a for loop to get a clearer view
names = [
['Abby', 'Bella','Cecilia'],
['Alex','Beatrice','Cynthia','David']
]
all_names = []
for name in sublist:
for sublist in names:
all_names.append(name)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[33], line 9
3 names = [
4 ['Abby', 'Bella','Cecilia'],
5 ['Alex','Beatrice','Cynthia','David']
6 ]
8 all_names = []
----> 9 for name in sublist:
10 for sublist in names:
11 all_names.append(name)
NameError: name 'sublist' is not defined
Coding Challenge! < / >
Process the names
list using a for loop, and store the names that start with the letter ‘A’ in a new list a_name
.
Then do the same thing using a list comprehension.
# Create a list that only contains the names that start with the letter 'A'
# Using a for loop
names = [
['Abby', 'Bella','Cecilia'],
['Alex','Beatrice','Cynthia','David']
]
# Create a list that only contains the names that start with the letter 'A'
# Using a list comprehension
Dictionary Comprehension#
The form of a dictionary comprehension is the same as for a list. Since a dictionary comprehension may deal with keys, values, or both, we need to be prepared to use .keys()
, .values()
, or .items()
(for both).
# Create a dictionary of contacts and occupations
contacts = {
'Amanda Bennett': 'Engineer, electrical',
'Bryan Miller': 'Radiation protection practitioner',
'Christopher Garrison': 'Planning and development surveyor',
'Debra Allen': 'Intelligence analyst',
'Donna Decker': 'Architect',
'Heather Bullock': 'Media planner',
'Jason Brown': 'Energy manager',
'Jason Soto': 'Lighting technician, broadcasting/film/video',
'Marissa Munoz': 'Further education lecturer',
'Matthew Mccall': 'Chief Technology Officer',
'Michael Norman': 'Translator',
'Nicole Leblanc': 'Financial controller',
'Noah Delgado': 'Engineer, civil',
'Rachel Charles': 'Physicist, medical',
'Stephanie Petty': 'Architect'}
When we loop over a dictionary, we will only loop over the keys of the dictionary.
# Looping over a dictionary only loops the keys
for contact in contacts:
print(contact)
Amanda Bennett
Bryan Miller
Christopher Garrison
Debra Allen
Donna Decker
Heather Bullock
Jason Brown
Jason Soto
Marissa Munoz
Matthew Mccall
Michael Norman
Nicole Leblanc
Noah Delgado
Rachel Charles
Stephanie Petty
# Looping over a dictionary by specifying .keys()
for key in contacts.keys():
print(key)
Amanda Bennett
Bryan Miller
Christopher Garrison
Debra Allen
Donna Decker
Heather Bullock
Jason Brown
Jason Soto
Marissa Munoz
Matthew Mccall
Michael Norman
Nicole Leblanc
Noah Delgado
Rachel Charles
Stephanie Petty
To loop over both the keys and the values, we will need to use dict.items()
.
# Looping over a dictionary by specifying .items()
for item in contacts.items():
print(item)
('Amanda Bennett', 'Engineer, electrical')
('Bryan Miller', 'Radiation protection practitioner')
('Christopher Garrison', 'Planning and development surveyor')
('Debra Allen', 'Intelligence analyst')
('Donna Decker', 'Architect')
('Heather Bullock', 'Media planner')
('Jason Brown', 'Energy manager')
('Jason Soto', 'Lighting technician, broadcasting/film/video')
('Marissa Munoz', 'Further education lecturer')
('Matthew Mccall', 'Chief Technology Officer')
('Michael Norman', 'Translator')
('Nicole Leblanc', 'Financial controller')
('Noah Delgado', 'Engineer, civil')
('Rachel Charles', 'Physicist, medical')
('Stephanie Petty', 'Architect')
Note that each key/value pair is returned as a tuple. A tuple is very similar to a Python list; the difference is that a tuple cannot be modified. The technical term in Python is immutable.
A list is mutable (can be changed)
A tuple is immutable (cannot be changed)
We can further distinguish between them by the fact that:
A list uses hard brackets
[]
A tuple uses parentheses
()
.
We can create a new dictionary from the original dictionary using a for loop to iterate through the key/value pairs. The for loop format is similar to a list except we need to use an index to refer to the key or value of the tuple.
# Create a new dictionary that only contains the engineers
# Using a for loop
engineer_contacts = {}
for contact in contacts.items():
if 'Engineer' in contact[1]:
engineer_contacts[contact[0]] = contact[1]
pprint(engineer_contacts)
{'Amanda Bennett': 'Engineer, electrical', 'Noah Delgado': 'Engineer, civil'}
# Use index to access the elements in a tuple
person = ("John Doe", "Engineer, mechanical")
person[1]
'Engineer, mechanical'
# A quick reminder of how to add key/value pairs to a dictionary
grades = {'John': 90, 'Mary': 95}
grades['Sue'] = 98
print(grades)
{'John': 90, 'Mary': 95, 'Sue': 98}
# Use a dictionary comprehension to iterate through the (key, value) tuples of the items in a dictionary
# Add each key:value pair to a new dictionary engineer_contacts
engineer_contacts = {contact[0]:contact[1] for contact in contacts.items() if 'Engineer' in contact[1]}
pprint(engineer_contacts)
{'Amanda Bennett': 'Engineer, electrical', 'Noah Delgado': 'Engineer, civil'}
Instead of using indices with each tuple, we can also give variable names to the keys and values respectively.
# Use key/value variable names for each tuple
# For loop example
engineer_contacts = {}
for (name, occupation) in contacts.items():
if 'Engineer' in occupation:
engineer_contacts[name] = occupation
pprint(engineer_contacts)
{'Amanda Bennett': 'Engineer, electrical', 'Noah Delgado': 'Engineer, civil'}
Note that when we assign the keys and values to the two variables name
and occupation
, we use parentheses to indicate that each pair is a tuple. However, the parentheses are not obligatory. You can remove them and the code will still work.
# Using key/value variable names for each tuple
# Dictionary comprehension example
engineer_contacts = {name : occupation for (name, occupation) in contacts.items() if 'Engineer' in occupation}
pprint(engineer_contacts)
{'Amanda Bennett': 'Engineer, electrical', 'Noah Delgado': 'Engineer, civil'}
In the section on list comrehensions, we saw that we can use list comprehensions to create a list from any kind of iterables. The same is true for dictionary comprehensions. We can use dictionary comprehensions to create a new dictionary based on any kind of iterables, not necessarily an old dictionary.
# Create a dictionary based on a list of word strings
# where keys are the words and values are the lengths of the words
# for loop example
words = ['more', 'is', 'said', 'than', 'done']
word_length = {}
for word in words:
word_length[word] = len(word)
print(word_length)
{'more': 4, 'is': 2, 'said': 4, 'than': 4, 'done': 4}
# Create a dictionary of word/word length pairs based on a list of word strings
# using a dictionary comprehension
word_length = {word : len(word) for word in words}
print(word_length)
{'more': 4, 'is': 2, 'said': 4, 'than': 4, 'done': 4}
Coding Challenge! < / >
Suppose you are a grocery store owner. Due to the inflation, you have to raise prices by 15%. In store_prices
are the items and their original price. Use a dictionary comprehension to create a new dictionary with the new price.
Hint: You can round a number to two decimal places using the round()
function. The first argument is the number to be rounded; the second argument is the level of precision. In this case, two decimal places.
round(3.1415926, 2)
3.14
# Create a new dictionary where all prices are 15% higher
store_prices = {
"milk": 3.49,
"egg": 5.29,
"bread": 2.99,
"spinach": 1.99,
"lettuce": 2.35,
"banana": 0.99
}
Set comprehension#
Sets in Python are written with curly braces. Curly braces {}
are used for both dictionaries and sets in Python. Which one is created depends on whether we supply the associated value or not. We can use the type()
function to discover what kind of object a variable is.
# Demonstrating a set
# One data entry per comma in curly braces
test_set = {1, 2, 3}
type(test_set)
set
# Demonstrating a dictionary
# Two data entries separated by a colon per each comma in curly braces
test_dict = {1 : 'apple', 2 : 'banana', 3 : 'cherry'}
type(test_dict)
dict
To create an empty set, we use the set()
function. By default, empty curly braces will create an empty dictionary.
# Demonstrating creation of empty dict vs empty set
test_set = set()
test_dict = {}
print(f'test_set is a {type(test_set)}')
print(f'test_dict is a {type(test_dict)}')
test_set is a <class 'set'>
test_dict is a <class 'dict'>
# Using a for loop with a set
set1 = {5, 6, 7, 8, 9}
set2 = set() ## note how we initialize an empty set
for num in set1:
if num > 5:
set2.add(num) # note how we add a new element to a set
print(set2)
{8, 9, 6, 7}
# Using a set comprehension
set2 = {num for num in set1 if num > 5}
print(set2)
{8, 9, 6, 7}
A set is an unordered collection of distinct objects. If you change the order of the elements or list an element more than once, that does not change the set.
# Using a comparison operator on two sets
# Same elements in different order
{1,2} == {2,1}
True
# Using a comparison operator on two sets
# Repeated elements in a set
{1,1,2} == {1,2}
True
# Printing a set with duplicates
# Duplicates are removed automatically
print({1, 1, 2})
{1, 2}
Again, we can use set comprehensions to create a new set based on any kind of iterables that have been defined.
# Create a new set containing only the names from the dictionary of contacts
names = {name for name in contacts}
pprint(names)
{'Amanda Bennett',
'Bryan Miller',
'Christopher Garrison',
'Debra Allen',
'Donna Decker',
'Heather Bullock',
'Jason Brown',
'Jason Soto',
'Marissa Munoz',
'Matthew Mccall',
'Michael Norman',
'Nicole Leblanc',
'Noah Delgado',
'Rachel Charles',
'Stephanie Petty'}
Coding Challenge! < / >
In the game Wordle, players must guess a five letter word in six guesses or less. A player is told if the letters in their guess are found in the word and if they are in the correct spot for the answer.
On their first guess, a player discovers that the 3rd letter is “I” and the 4th letter is “S”. If they have the set of all possible words, they could narrow down their guesses.
Assume the words
set contains a set of all possible words, can you write a set comprehension that will generate a set of all possible solutions?
As an extra challenge, write additional set comprehensions to eliminate answers that contain the letters “P”, “R”, or “M”.
# Write a set comprehension that creates a set of potential answers
words = {'carbon',
'monkey',
'rabbit',
'theory',
'grist',
'farmer',
'pillow',
'exist',
'frisk',
'harbor',
'prism'
}
Lesson Complete#
Congratulations! You have completed Python Intermediate 1.
Start Next Lesson: Python Intermediate 2#
Exercise Solutions#
Here are a few solutions for exercises in this lesson.
# Creating a new list odd_num from odd numbers
# Using a list comprehension
numbers = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
odd_num = [number for number in numbers if number % 2 == 1]
print(odd_num)
[1, 3, 5, 7, 9]
# Create a new list that only includes first names
# Using a list comprehension
people = ['Aaron Aston',
'Brianna Barton',
'Carla Cameron',
'Delia Darcy',
'Evelyn Elgin',
'Frederick Federov',
'Gaston Garbo']
friends = [name.split()[0] for name in people]
print(friends)
['Aaron', 'Brianna', 'Carla', 'Delia', 'Evelyn', 'Frederick', 'Gaston']
# Create a list that only contains the names that start with the letter 'A'
# Using a for loop
names = [
['Abby', 'Bella','Cecilia'],
['Alex','Beatrice','Cynthia','David']
]
a_name = []
for sublist in names:
for name in sublist:
if name[0] == 'A':
a_name.append(name)
print(a_name)
['Abby', 'Alex']
# Create a list that only contains the names that start with the letter 'A'
# Using a list comprehension
a_name = [name for sublist in names for name in sublist if name[0] == 'A']
print(a_name)
['Abby', 'Alex']
# Create a new dictionary where all prices are 15% higher
store_prices = {
"milk": 3.49,
"egg": 5.29,
"bread": 2.99,
"spinach": 1.99,
"lettuce": 2.35,
"banana": 0.99
}
new_prices = {item : round(price * 1.15, 2) for (item, price) in store_prices.items()}
pprint(new_prices)
{'banana': 1.14,
'bread': 3.44,
'egg': 6.08,
'lettuce': 2.7,
'milk': 4.01,
'spinach': 2.29}
# Write a set comprehension that creates a set of potential answers
words = {'carbon',
'monkey',
'rabbit',
'theory',
'grist',
'farmer',
'pillow',
'exist',
'frisk',
'harbor',
'prism'
}
answers = {word for word in words if word[2] == 'i' and word[3] == 's'}
print(answers)
{'exist', 'grist', 'frisk', 'prism'}
# Write additional set comprehensions to eliminate answers that contain the letters "P", "R", or "M"
answers = {word for word in answers if 'p' not in word and 'r' not in word and 'm' not in word}
print(answers)
{'exist'}