Skip to content
TextDiff

How to Compare Strings in Python

Python string comparison: equality, ordering, case-insensitive matching, fuzzy similarity, and difflib for line-by-line diffs. With examples.

By Editorial Team Updated
  • python
  • strings
  • comparison
  • difflib
  • text processing
How to Compare Strings in Python

Python provides several ways to compare strings, from simple equality checks to fuzzy matching and line-by-line diffs. Here’s the full picture.

Equality and inequality

The == operator checks if two strings have the same characters in the same order:

a = "hello"
b = "hello"
c = "world"

a == b    # True
a == c    # False
a != c    # True

Python strings are immutable, so == always compares by value (not by identity). The is operator checks identity and should not be used for string comparison:

# Don't use `is` for string comparison
a = "hello"
b = "hel" + "lo"
a == b    # True (correct)
a is b    # May be True or False depending on string interning — unreliable

Lexicographic ordering

Python compares strings lexicographically (dictionary order) using Unicode code points:

"apple" < "banana"   # True
"apple" < "Apple"    # False — lowercase 'a' (97) > uppercase 'A' (65)
"abc" < "abcd"       # True — shorter string comes first if prefix matches
"z" > "a"            # True

This matters for sorting:

words = ["banana", "apple", "Cherry"]
sorted(words)
# ['Cherry', 'apple', 'banana']  ← uppercase before lowercase

sorted(words, key=str.lower)
# ['apple', 'banana', 'Cherry']  ← case-insensitive sort

Case-insensitive comparison

Use .lower() or .casefold():

a = "Hello"
b = "hello"

a.lower() == b.lower()    # True
a.casefold() == b.casefold()  # True

.casefold() is the more aggressive normalization — it handles non-ASCII comparisons (like the German “ß” → “ss”) correctly. Prefer .casefold() for internationalized text.

Comparing substrings

text = "The quick brown fox"

# Contains
"quick" in text       # True
"slow" in text        # False

# Starts/ends with
text.startswith("The")   # True
text.endswith("fox")     # True

# Find position
text.find("brown")    # 10 (index of first match, -1 if not found)
text.index("brown")   # 10 (raises ValueError if not found)

Normalized comparison

Before comparing strings from different sources, normalize whitespace and encoding:

def normalize(s: str) -> str:
    import unicodedata
    # Normalize Unicode forms (NFC combines characters, NFD decomposes)
    s = unicodedata.normalize("NFC", s)
    # Collapse whitespace
    return " ".join(s.split())

normalize("hello   world") == normalize("hello world")  # True

Fuzzy / similarity comparison with difflib

Python’s standard library includes difflib, which provides several similarity metrics:

from difflib import SequenceMatcher

def similarity(a: str, b: str) -> float:
    return SequenceMatcher(None, a, b).ratio()

similarity("hello world", "hello earth")  # 0.727...
similarity("hello", "hello")              # 1.0
similarity("hello", "world")              # 0.2

The ratio ranges from 0.0 (no match) to 1.0 (identical). Under the hood, SequenceMatcher uses a variant of the Myers diff algorithm.

get_close_matches is useful for fuzzy search:

from difflib import get_close_matches

words = ["color", "colour", "colors", "flavour", "flavor"]
get_close_matches("colour", words, n=3, cutoff=0.6)
# ['colour', 'color', 'colors']

Line-by-line diff with difflib

To get a diff between two multiline strings, difflib provides several formatters:

import difflib

text1 = """line one
line two
old line three
line four
""".splitlines(keepends=True)

text2 = """line one
line two
new line three
line five
""".splitlines(keepends=True)

# Unified diff
diff = difflib.unified_diff(text1, text2, fromfile="original", tofile="modified")
print("".join(diff))

Output:

--- original
+++ modified
@@ -1,4 +1,4 @@
 line one
 line two
-old line three
-line four
+new line three
+line five

For an HTML-formatted diff:

html = difflib.HtmlDiff().make_file(text1, text2, "original", "modified")
with open("diff.html", "w") as f:
    f.write(html)

Regular expressions for pattern matching

For pattern-based comparison, use re:

import re

# Check if string matches a pattern
re.match(r"^\d{4}-\d{2}-\d{2}$", "2026-04-25")  # matches (ISO date)
re.match(r"^\d{4}-\d{2}-\d{2}$", "not-a-date")   # None

# Case-insensitive match
re.match(r"hello", "Hello World", re.IGNORECASE)  # matches

# Find all occurrences
re.findall(r"\bword\b", "word wordsmith sword word", re.IGNORECASE)
# ['word', 'word']

Quick summary

GoalApproach
Exact equalitya == b
Case-insensitivea.casefold() == b.casefold()
Lexicographic order<, >, <=, >=
Substring check"sub" in s
Fuzzy similarity scoreSequenceMatcher(None, a, b).ratio()
Close matchesdifflib.get_close_matches()
Line-by-line diffdifflib.unified_diff()
Pattern matchre.match() / re.fullmatch()

Online text diff

For comparing longer texts — documents, config files, code blocks — paste them into textdiff.pro for a visual diff with added and removed lines highlighted.