Encoding vs Encryption vs Hashing: A Security Practitioner's Guide
These three concepts are constantly confused—even by developers who should know better. This guide breaks down what each one does, when to use it, and the security implications of choosing wrong.
In security assessments, we routinely find critical vulnerabilities caused by confusing these three concepts. Passwords stored with Base64 "encryption." Sensitive data "protected" with MD5 hashing. JWT tokens with the algorithm set to "none." These aren't obscure edge cases—they're common mistakes born from fundamental misunderstanding.
Let's fix that. Here's the definitive breakdown of encoding, encryption, and hashing—what they are, how they work, and when to use each.
The TL;DR Comparison
| Property | Encoding | Encryption | Hashing |
|---|---|---|---|
| Purpose | Data format transformation | Confidentiality (hiding data) | Integrity verification |
| Reversible? | Yes, by anyone | Yes, with the key | No (one-way) |
| Uses a key? | No | Yes | No (but can be keyed: HMAC) |
| Security goal? | None | Confidentiality | Integrity, password storage |
| Same input = same output? | Yes | Depends (IV/nonce) | Yes |
Encoding: Format Transformation, Not Security
Encoding in One Sentence
Encoding transforms data into a different format for compatibility or transmission—it provides zero security because anyone can decode it without any secret.
Encoding exists to solve format problems, not security problems. When you need to transmit binary data through a text-only channel (like email or URLs), you encode it. When you need to represent special characters safely in HTML, you encode them. When you need to store binary in JSON, you encode it.
Common Encoding Schemes
Base64 — Converts binary to ASCII text using 64 characters (A-Z, a-z, 0-9, +, /). Output is ~33% larger than input. Used for embedding images in HTML, email attachments (MIME), and data URLs.
# Encoding
$ echo -n "Hello, World!" | base64
SGVsbG8sIFdvcmxkIQ==
# Decoding
$ echo "SGVsbG8sIFdvcmxkIQ==" | base64 -d
Hello, World!
# Python
import base64
encoded = base64.b64encode(b"Hello, World!") # b'SGVsbG8sIFdvcmxkIQ=='
decoded = base64.b64decode(encoded) # b'Hello, World!'
URL Encoding (Percent Encoding) — Represents unsafe characters in URLs as %XX hexadecimal. Spaces become %20 or +. Required for query parameters containing special characters.
# Original
Hello World & Goodbye
# URL Encoded
Hello%20World%20%26%20Goodbye
# Python
from urllib.parse import quote, unquote
encoded = quote("Hello World & Goodbye") # 'Hello%20World%20%26%20Goodbye'
decoded = unquote(encoded) # 'Hello World & Goodbye'
HTML Encoding — Represents characters that have special meaning in HTML. Prevents the browser from interpreting user input as markup.
# Original (XSS payload)
<script>alert('xss')</script>
# HTML Encoded (safe to display)
<script>alert('xss')</script>
# Key entities:
# < becomes <
# > becomes >
# & becomes &
# " becomes "
# ' becomes '
Hex Encoding — Represents each byte as two hexadecimal characters. Common in debugging, shellcode, and low-level data representation.
# ASCII "ABC" in hex
41 42 43
# Python
data = b"ABC"
hex_encoded = data.hex() # '414243'
decoded = bytes.fromhex('414243') # b'ABC'
Base64 is not encryption. We regularly find applications that "protect" API keys, passwords, or sensitive configuration with Base64 encoding. This provides zero security—anyone can decode Base64 instantly without any key or secret. If you're Base64 encoding something to "hide" it, you're doing it wrong.
Encryption: Confidentiality Through Keys
Encryption in One Sentence
Encryption transforms readable data (plaintext) into unreadable data (ciphertext) using a key—only someone with the correct key can reverse the process.
Encryption provides confidentiality. It ensures that even if an attacker intercepts the data, they cannot read it without the key. Unlike encoding, encryption is specifically designed to resist unauthorized reversal.
Symmetric Encryption
Symmetric encryption uses the same key for encryption and decryption. It's fast and efficient for large amounts of data.
from cryptography.hazmat.primitives.ciphers.aead import AESGCM
import os
# Generate a random 256-bit key (store this securely!)
key = os.urandom(32) # 32 bytes = 256 bits
# Encrypt
plaintext = b"Sensitive data here"
nonce = os.urandom(12) # 96-bit nonce for GCM
aesgcm = AESGCM(key)
ciphertext = aesgcm.encrypt(nonce, plaintext, None)
# Decrypt
decrypted = aesgcm.decrypt(nonce, ciphertext, None)
# decrypted == b"Sensitive data here"
# The ciphertext is meaningless without the key
# Even with the algorithm known, brute-forcing 256-bit AES is infeasible
Key symmetric algorithms:
- AES-256-GCM — Current gold standard. Authenticated encryption provides confidentiality AND integrity. Use this.
- ChaCha20-Poly1305 — Modern alternative to AES. Faster in software without hardware acceleration. Used by WireGuard and TLS 1.3.
- AES-256-CBC — Older mode. Requires separate MAC for integrity. Vulnerable to padding oracle attacks if implemented incorrectly.
DES, 3DES, RC4, and Blowfish are obsolete. DES has a 56-bit key (breakable in hours). RC4 has known biases. If you see these in production code, they need to be replaced.
Asymmetric Encryption
Asymmetric encryption uses a key pair: a public key for encryption and a private key for decryption. Anyone can encrypt with your public key, but only you can decrypt with your private key.
from cryptography.hazmat.primitives.asymmetric import rsa, padding
from cryptography.hazmat.primitives import hashes
# Generate key pair
private_key = rsa.generate_private_key(
public_exponent=65537,
key_size=4096 # Use 4096 bits for long-term security
)
public_key = private_key.public_key()
# Encrypt with public key (anyone can do this)
plaintext = b"Secret message"
ciphertext = public_key.encrypt(
plaintext,
padding.OAEP(
mgf=padding.MGF1(algorithm=hashes.SHA256()),
algorithm=hashes.SHA256(),
label=None
)
)
# Decrypt with private key (only key owner)
decrypted = private_key.decrypt(
ciphertext,
padding.OAEP(
mgf=padding.MGF1(algorithm=hashes.SHA256()),
algorithm=hashes.SHA256(),
label=None
)
)
Key asymmetric algorithms:
- RSA — The classic. Use 4096-bit keys for new applications. Slower than symmetric but solves the key distribution problem.
- ECDH (Elliptic Curve Diffie-Hellman) — Key exchange protocol. Used to establish shared secrets for symmetric encryption.
- ECIES — Elliptic curve encryption scheme. Combines ECDH key exchange with symmetric encryption.
In practice, asymmetric and symmetric encryption are combined. Asymmetric encryption is slow and limited in message size. The common pattern: generate a random symmetric key, encrypt the data with it (AES), then encrypt the symmetric key with the recipient's public key (RSA). This gives you the best of both worlds.
Critical Encryption Concepts
Initialization Vector (IV) / Nonce: A random value used to ensure the same plaintext encrypts to different ciphertext each time. Without it, attackers can detect repeated messages. Never reuse an IV with the same key.
Authenticated Encryption: Modes like GCM and ChaCha20-Poly1305 verify that ciphertext hasn't been tampered with. Plain CBC mode doesn't—an attacker can flip bits in ciphertext and produce valid (but corrupted) plaintext. Always use authenticated encryption.
Key Management: The algorithm doesn't matter if your keys are poorly managed. Keys should never be hardcoded, committed to git, or stored alongside encrypted data. Use dedicated secrets management (HashiCorp Vault, AWS KMS, etc.).
Hashing: One-Way Fingerprints
Hashing in One Sentence
A hash function takes input of any size and produces a fixed-size output (digest) that cannot be reversed—even tiny input changes produce completely different outputs.
Hashing is fundamentally different from encryption: you cannot recover the original input from a hash. This isn't a limitation—it's the point. Hashes verify integrity and store passwords without keeping the actual password.
Hash Function Properties
- Deterministic: Same input always produces same hash
- One-way: Cannot derive input from hash (preimage resistance)
- Collision resistant: Infeasible to find two inputs with the same hash
- Avalanche effect: Small input change = completely different hash
- Fixed output size: Regardless of input length
import hashlib
# Same input = same hash (deterministic)
hash1 = hashlib.sha256(b"Hello").hexdigest()
hash2 = hashlib.sha256(b"Hello").hexdigest()
# hash1 == hash2 == '185f8db32271fe25f561a6fc938b2e264306ec304eda518007d1764826381969'
# Tiny change = completely different hash (avalanche effect)
hash3 = hashlib.sha256(b"hello").hexdigest() # lowercase 'h'
# hash3 == '2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824'
# No way to reverse the hash back to "Hello"
Common Hash Functions
| Algorithm | Output Size | Status | Use Case |
|---|---|---|---|
| MD5 | 128 bits | Broken | Checksums only (non-security) |
| SHA-1 | 160 bits | Broken | Legacy systems only |
| SHA-256 | 256 bits | Secure | General purpose, integrity |
| SHA-384/512 | 384/512 bits | Secure | High-security applications |
| SHA-3 | Variable | Secure | Alternative to SHA-2 |
| BLAKE2 | Variable | Secure | Fast, modern applications |
Collision attacks against MD5 are trivial—you can generate colliding PDFs on a laptop. SHA-1 collision was demonstrated in 2017 (SHAttered attack). Never use MD5 or SHA-1 for security purposes. They're acceptable only for non-security checksums (file deduplication, cache keys).
Password Hashing: A Special Case
General-purpose hash functions (SHA-256, etc.) are too fast for passwords. An attacker with a GPU can compute billions of SHA-256 hashes per second. Password hashing algorithms are intentionally slow:
import bcrypt
# Hash a password (slow by design)
password = b"user_password_123"
salt = bcrypt.gensalt(rounds=12) # 2^12 iterations
hashed = bcrypt.hashpw(password, salt)
# b'$2b$12$LQv3c1yqBWVHxkd0LHAkCOYz6TtxMQJqhN8/X4.VTtYWBw6Ck1Wqa'
# Verify a password
if bcrypt.checkpw(password, hashed):
print("Password matches")
# The salt is embedded in the hash - no separate storage needed
# Work factor (rounds=12) makes brute force infeasible
Password hashing algorithms:
- Argon2id — Winner of the Password Hashing Competition (2015). Memory-hard, resistant to GPU/ASIC attacks. The current recommendation.
- bcrypt — Battle-tested since 1999. CPU-hard with configurable work factor. Still excellent choice.
- scrypt — Memory-hard. Designed to resist hardware attacks. Good but Argon2 is preferred.
- PBKDF2 — NIST approved. Uses HMAC with many iterations. Acceptable but weaker than memory-hard alternatives.
from argon2 import PasswordHasher
ph = PasswordHasher(
time_cost=3, # Number of iterations
memory_cost=65536, # 64 MB memory usage
parallelism=4 # Parallel threads
)
# Hash
hash = ph.hash("user_password_123")
# $argon2id$v=19$m=65536,t=3,p=4$c29tZXNhbHQ$...
# Verify
try:
ph.verify(hash, "user_password_123")
print("Password valid")
except:
print("Invalid password")
HMAC: Keyed Hashing
HMAC (Hash-based Message Authentication Code) adds a secret key to the hash process. It provides integrity AND authenticity—only someone with the key can generate a valid HMAC.
import hmac
import hashlib
key = b"shared_secret_key"
message = b"Data to authenticate"
# Generate HMAC
mac = hmac.new(key, message, hashlib.sha256).hexdigest()
# 'a4b9c8d7e6f5...' (64 hex chars for SHA-256)
# Verify HMAC (constant-time comparison to prevent timing attacks)
expected_mac = "a4b9c8d7e6f5..."
if hmac.compare_digest(mac, expected_mac):
print("Message is authentic and unmodified")
HMAC is used for API authentication (AWS Signature), JWT signatures, cookie integrity, and secure webhooks.
Common Mistakes and How to Fix Them
Mistake 1: Using Encoding for Security
# "Hiding" an API key with Base64
api_key = base64.b64encode(b"sk_live_abc123xyz").decode()
# Config file contains: 'c2tfbGl2ZV9hYmMxMjN4eXo='
# This is NOT protected - anyone can decode it
# Use environment variables or secrets management
import os
api_key = os.environ['API_KEY']
# Or encrypt with a key stored separately
from cryptography.fernet import Fernet
key = os.environ['ENCRYPTION_KEY'] # Stored in secrets manager
cipher = Fernet(key)
encrypted_api_key = cipher.encrypt(b"sk_live_abc123xyz")
Mistake 2: Using Fast Hashes for Passwords
# SHA-256 is too fast for passwords
password_hash = hashlib.sha256(password.encode()).hexdigest()
# Attacker can try billions of guesses per second
# Use a password-specific algorithm
import bcrypt
password_hash = bcrypt.hashpw(password.encode(), bcrypt.gensalt(rounds=12))
# Each guess takes ~250ms - billions of guesses = centuries
Mistake 3: Encrypting Without Authentication
# AES-CBC without MAC - vulnerable to bit-flipping attacks
from Crypto.Cipher import AES
cipher = AES.new(key, AES.MODE_CBC, iv)
ciphertext = cipher.encrypt(pad(plaintext, 16))
# Attacker can modify ciphertext and produce valid (corrupted) plaintext
# Use authenticated encryption (AES-GCM)
from cryptography.hazmat.primitives.ciphers.aead import AESGCM
aesgcm = AESGCM(key)
ciphertext = aesgcm.encrypt(nonce, plaintext, associated_data)
# Any tampering is detected during decryption
Mistake 4: Hardcoding Keys
# Key in source code = key in git history forever
ENCRYPTION_KEY = "super_secret_key_12345"
# Key from environment or secrets manager
import os
ENCRYPTION_KEY = os.environ.get('ENCRYPTION_KEY')
# Or from AWS Secrets Manager, HashiCorp Vault, etc.
Decision Flowchart
Use this to choose the right approach:
Need to transmit binary as text? → Encoding (Base64)
Need to hide data from unauthorized parties? → Encryption
Need to verify data hasn't changed? → Hashing (SHA-256)
Need to store passwords? → Password hashing (Argon2, bcrypt)
Need to verify data AND prove who sent it? → HMAC or digital signatures
Need to transmit data securely between parties? → TLS (which uses all of the above)
Conclusion
The confusion between encoding, encryption, and hashing leads to real vulnerabilities. Encoding provides no security. Encryption requires proper key management and authenticated modes. Hashing is one-way and requires special algorithms for passwords.
When in doubt:
- Never use encoding for security
- Use AES-256-GCM or ChaCha20-Poly1305 for encryption
- Use SHA-256 or BLAKE2 for integrity checking
- Use Argon2id or bcrypt for passwords
- Never roll your own crypto
For security assessments that identify cryptographic weaknesses in your applications, contact Brickell Technologies.