How to Compare Strings in Python
Python string comparison: equality, ordering, case-insensitive matching, fuzzy similarity, and difflib for line-by-line diffs. With examples.
- python
- strings
- comparison
- difflib
- text processing
Python provides several ways to compare strings, from simple equality checks to fuzzy matching and line-by-line diffs. Here’s the full picture.
Equality and inequality
The == operator checks if two strings have the same characters in the same order:
a = "hello"
b = "hello"
c = "world"
a == b # True
a == c # False
a != c # True
Python strings are immutable, so == always compares by value (not by identity). The is operator checks identity and should not be used for string comparison:
# Don't use `is` for string comparison
a = "hello"
b = "hel" + "lo"
a == b # True (correct)
a is b # May be True or False depending on string interning — unreliable
Lexicographic ordering
Python compares strings lexicographically (dictionary order) using Unicode code points:
"apple" < "banana" # True
"apple" < "Apple" # False — lowercase 'a' (97) > uppercase 'A' (65)
"abc" < "abcd" # True — shorter string comes first if prefix matches
"z" > "a" # True
This matters for sorting:
words = ["banana", "apple", "Cherry"]
sorted(words)
# ['Cherry', 'apple', 'banana'] ← uppercase before lowercase
sorted(words, key=str.lower)
# ['apple', 'banana', 'Cherry'] ← case-insensitive sort
Case-insensitive comparison
Use .lower() or .casefold():
a = "Hello"
b = "hello"
a.lower() == b.lower() # True
a.casefold() == b.casefold() # True
.casefold() is the more aggressive normalization — it handles non-ASCII comparisons (like the German “ß” → “ss”) correctly. Prefer .casefold() for internationalized text.
Comparing substrings
text = "The quick brown fox"
# Contains
"quick" in text # True
"slow" in text # False
# Starts/ends with
text.startswith("The") # True
text.endswith("fox") # True
# Find position
text.find("brown") # 10 (index of first match, -1 if not found)
text.index("brown") # 10 (raises ValueError if not found)
Normalized comparison
Before comparing strings from different sources, normalize whitespace and encoding:
def normalize(s: str) -> str:
import unicodedata
# Normalize Unicode forms (NFC combines characters, NFD decomposes)
s = unicodedata.normalize("NFC", s)
# Collapse whitespace
return " ".join(s.split())
normalize("hello world") == normalize("hello world") # True
Fuzzy / similarity comparison with difflib
Python’s standard library includes difflib, which provides several similarity metrics:
from difflib import SequenceMatcher
def similarity(a: str, b: str) -> float:
return SequenceMatcher(None, a, b).ratio()
similarity("hello world", "hello earth") # 0.727...
similarity("hello", "hello") # 1.0
similarity("hello", "world") # 0.2
The ratio ranges from 0.0 (no match) to 1.0 (identical). Under the hood, SequenceMatcher uses a variant of the Myers diff algorithm.
get_close_matches is useful for fuzzy search:
from difflib import get_close_matches
words = ["color", "colour", "colors", "flavour", "flavor"]
get_close_matches("colour", words, n=3, cutoff=0.6)
# ['colour', 'color', 'colors']
Line-by-line diff with difflib
To get a diff between two multiline strings, difflib provides several formatters:
import difflib
text1 = """line one
line two
old line three
line four
""".splitlines(keepends=True)
text2 = """line one
line two
new line three
line five
""".splitlines(keepends=True)
# Unified diff
diff = difflib.unified_diff(text1, text2, fromfile="original", tofile="modified")
print("".join(diff))
Output:
--- original
+++ modified
@@ -1,4 +1,4 @@
line one
line two
-old line three
-line four
+new line three
+line five
For an HTML-formatted diff:
html = difflib.HtmlDiff().make_file(text1, text2, "original", "modified")
with open("diff.html", "w") as f:
f.write(html)
Regular expressions for pattern matching
For pattern-based comparison, use re:
import re
# Check if string matches a pattern
re.match(r"^\d{4}-\d{2}-\d{2}$", "2026-04-25") # matches (ISO date)
re.match(r"^\d{4}-\d{2}-\d{2}$", "not-a-date") # None
# Case-insensitive match
re.match(r"hello", "Hello World", re.IGNORECASE) # matches
# Find all occurrences
re.findall(r"\bword\b", "word wordsmith sword word", re.IGNORECASE)
# ['word', 'word']
Quick summary
| Goal | Approach |
|---|---|
| Exact equality | a == b |
| Case-insensitive | a.casefold() == b.casefold() |
| Lexicographic order | <, >, <=, >= |
| Substring check | "sub" in s |
| Fuzzy similarity score | SequenceMatcher(None, a, b).ratio() |
| Close matches | difflib.get_close_matches() |
| Line-by-line diff | difflib.unified_diff() |
| Pattern match | re.match() / re.fullmatch() |
Online text diff
For comparing longer texts — documents, config files, code blocks — paste them into textdiff.pro for a visual diff with added and removed lines highlighted.