Python Foundation

Data Types in Python

List

Ordered
Mutable
Allows Duplicate
Store pointers to the objects in memory
heterogeneous (can store string and integer and another list)
.append(x)
.extend(iterable)
.insert(i, x) insert x at position i. i = 0 inserts at the beginning
.remove(x) removes the first occurrence
.pop(i) removes and returns the item at the position i
.pop() removes and returns the last item
Using a list as a stack (efficient)
Using a list as a queue (highly inefficient)

list_from_string = list("hello")
# ['h', 'e', 'l', 'l', 'o']
 
stack = []
stack.append(4)
last_item = stack.pop()
 
deque = collections.deque() # double ended queue O(1) in both directions

Dictionary

Organizer | Collection of key-value pairs
database records, configurations, JSON from APIS
Insertion Ordered
NO Duplicate keys (Old value will be overwritten)
Key Must be Hashable (Immutable Type): strings, numbers, booleans, tuples, No list, sets, dicts
Hash Table | Hash Map O(1) look up time on average

empty_dict = {}
 
user = {
	"username": "alex",
	"id": 103,
	"is_active": True
}
 
# dict() Constructor
user = dict(username="alex", id=103, is_active=True)
 
# zip() trick
keys = ["fruit", "vegetable", "grain"] 
values = ["apple", "broccoli", "rice"]
 
food_map = dict(zip(keys, values))
# zip() returns a zip object
 
#################################
 
# Accessing
d[key] 
# raise KeyError if key does not exist
 
d.get(key, default) # safer way to access a potentially missing keys.
 
# EXAMPLE
config = {"retries": 3}
config["timeout"] # Raise KeyError
 
config.get("timeout", 30) # sets default (30) 
 
#################################
 
.keys() #dict_keys object
.values() # dict_values object
.items() # key-value tuple pair

Set

Unordered, Unique, Hashable Elements ONLY
Deduplication: The single most efficient way to remove duplicate elements from a list is to convert it to a set and then back to a list. This is a fundamental pattern in data cleaning.
Membership testing in faster in set than in list. O(1) vs O(n)

empty_set = set()
 
vowels = {'a', 'e', 'i', 'o', 'u'}
 
user_ids = [_, _, _]
unique_user_ids = list(set(user_ids)) # order not guaranteed

Data Manipulation Operations

A list of dictionaries is a very common data structure, often resulting from reading a CSV file or parsing a JSON API response. Filtering this structure is a daily task for data analysts and engineers. Python offers several ways to accomplish this, with list comprehensions being the most idiomatic.

*List Comprehension (Preferred Method)

employees = {}
 
engineers = [emp for emp in employees if emp['role'] == "Engineer"]
 
high_earning_engineers = [emp for emp in employees
	if emp['role'] == "Engineer" and emp['salary'] > 100000
]

Filtering a Dictionary

grades = {'John': 85, 'Mary': 92, 'Matt': 78, 'Michael': 95, 'Laura': 88}
 
# get students with >= 90 score
top_performers = {name: score for name, score in grades.items() if score >= 90}
# new_dict = {key: value for key, value in dict.items() if value ... sth}
 
m_students = {name: score for name, score in grades.items() if name.startswith('M')}

Sorting

list.sort() : in-place sorting; returns None
sorted(iterables): returns a new, sorted iterable
Sorting a list of dictionaries

employees = {sth}
 
sorted_by_salary_asc = sorted(employees, key=lambda emp: emp['salary'])
 
sorted_by_salary_desc = sorted(employees, key=lambda emp: emp['salary'], reverse=True)
 
# Complex Sort, Tie-breaking
sorted_complex = sorted(employees, key=lambda emp: (emp['role'], -emp['salary']))

Aggregating, Grouping | from collections import defaultdict!!!!

Grouping is a cornerstone of data aggregation and analysis. It is the process of taking a flat list of items and restructuring it into a nested data structure—typically a dictionary of lists—where items are categorized based on a common property or key.

transactions = [
    {'id': 't1', 'category': 'books', 'amount': 25},
    {'id': 't2', 'category': 'electronics', 'amount': 120},
    {'id': 't3', 'category': 'books', 'amount': 15},
    {'id': 't4', 'category': 'clothing', 'amount': 50},
    {'id': 't5', 'category': 'electronics', 'amount': 85},
]
 
 
# input: a list of dictionaries
# returns: a dictionary with key=category and value = list of transactions in the specific category
 
 
from collections import defaultdict
 
grouped_transactions = defaultdict(list)
for transaction in transactions:
	cat = transaction['category']
	grouped_transactions[cat].append(transaction)

sort(iterable, key, reverse)

Taming Nested Data: From APIs and JSON to Python Objects

Deserialization: Converting JSON to Python Object
Serialization: Converting a Python Object to JSON

json Modules

json.loads(json_string): JSON-formatted string to Python object; deserialization
json.load(file_object): Reads from a file-like object (e.g., a file opened in read mode) containing JSON data and returns the corresponding Python object.
json.dumps(python_object, indent=None): Python object to JSON-formatted string; serialization
json.dump(python_object, file_object): Takes a Python object and writes it to a file-like object in JSON format.

import json 
 
# load from file
with open("./data.json", mode='r') as file:
    read_as_dict = json.load(file)
 
# write to file
with open("./data.json", mode='w') as file:
	json.dump(data, file)

API Response

import json 
import requests
 
url = "something"
try:
	response = requests.get(url)
	response.raise_for_status()
	data = response.json() # python dict object 
	
except Exception as e:
	print(e)

Miscellaneous

 
from collections import Counter
 
sentence = "the quick brown fox jumps over the lazy dog"
words = sentence.split()
word_counts = Counter(words)
most_common = word_counts.most_common(3)
 
 
from collections import deque #O(1) from either direction
task_queue = deque()
 
task_queue.append("Task 1")
task_queue.append("Task 2")
 
next_task = task_queue.popleft() # 'Task 1'

Think in Patterns: Recognize that most data manipulation tasks are variations of a few fundamental patterns: filtering, sorting, grouping, and transforming. By identifying the pattern, one can apply the appropriate and most Pythonic tool for the job.
Choose the Right Tool for the Job: Do not default to a list. Before writing a line of code, consider the access patterns the data requires. Does it need positional access? Fast key-based lookups? Uniqueness and set logic? A conscious choice between a list, dict, and set is the first step toward writing efficient and clean code.
Embrace Comprehensions: Make list, dictionary, and set comprehensions the default tool for creating new collections from existing iterables. They are more than just syntactic sugar; they are a core part of the Pythonic idiom, leading to more concise, readable, and often more performant code.
Master Nested Navigation: In an API-driven world, data is rarely flat. Practice safe and efficient navigation of nested dictionaries and lists. Make robust patterns like the .get() method and try-except blocks second nature to handle the inevitable inconsistencies of real-world data.

dev/brain

Explorer

Python Foundation

Data Types in Python

List

Dictionary

Set

Data Manipulation Operations

Sorting

Aggregating, Grouping | from collections import defaultdict!!!!

Taming Nested Data: From APIs and JSON to Python Objects

API Response

Miscellaneous

Graph View

Table of Contents