Encoding vs Encryption vs Hashing: A Security Practitioner's Guide

These three concepts are constantly confused—even by developers who should know better. This guide breaks down what each one does, when to use it, and the security implications of choosing wrong.

In security assessments, we routinely find critical vulnerabilities caused by confusing these three concepts. Passwords stored with Base64 "encryption." Sensitive data "protected" with MD5 hashing. JWT tokens with the algorithm set to "none." These aren't obscure edge cases—they're common mistakes born from fundamental misunderstanding.

Let's fix that. Here's the definitive breakdown of encoding, encryption, and hashing—what they are, how they work, and when to use each.

The TL;DR Comparison

Property Encoding Encryption Hashing
Purpose Data format transformation Confidentiality (hiding data) Integrity verification
Reversible? Yes, by anyone Yes, with the key No (one-way)
Uses a key? No Yes No (but can be keyed: HMAC)
Security goal? None Confidentiality Integrity, password storage
Same input = same output? Yes Depends (IV/nonce) Yes

Encoding: Format Transformation, Not Security

Encoding in One Sentence

Encoding transforms data into a different format for compatibility or transmission—it provides zero security because anyone can decode it without any secret.

Encoding exists to solve format problems, not security problems. When you need to transmit binary data through a text-only channel (like email or URLs), you encode it. When you need to represent special characters safely in HTML, you encode them. When you need to store binary in JSON, you encode it.

Common Encoding Schemes

Base64 — Converts binary to ASCII text using 64 characters (A-Z, a-z, 0-9, +, /). Output is ~33% larger than input. Used for embedding images in HTML, email attachments (MIME), and data URLs.

Base64 Example
# Encoding $ echo -n "Hello, World!" | base64 SGVsbG8sIFdvcmxkIQ== # Decoding $ echo "SGVsbG8sIFdvcmxkIQ==" | base64 -d Hello, World! # Python import base64 encoded = base64.b64encode(b"Hello, World!") # b'SGVsbG8sIFdvcmxkIQ==' decoded = base64.b64decode(encoded) # b'Hello, World!'

URL Encoding (Percent Encoding) — Represents unsafe characters in URLs as %XX hexadecimal. Spaces become %20 or +. Required for query parameters containing special characters.

URL Encoding Example
# Original Hello World & Goodbye # URL Encoded Hello%20World%20%26%20Goodbye # Python from urllib.parse import quote, unquote encoded = quote("Hello World & Goodbye") # 'Hello%20World%20%26%20Goodbye' decoded = unquote(encoded) # 'Hello World & Goodbye'

HTML Encoding — Represents characters that have special meaning in HTML. Prevents the browser from interpreting user input as markup.

HTML Encoding Example
# Original (XSS payload) <script>alert('xss')</script> # HTML Encoded (safe to display) &lt;script&gt;alert('xss')&lt;/script&gt; # Key entities: # < becomes &lt; # > becomes &gt; # & becomes &amp; # " becomes &quot; # ' becomes &#x27;

Hex Encoding — Represents each byte as two hexadecimal characters. Common in debugging, shellcode, and low-level data representation.

Hex Encoding Example
# ASCII "ABC" in hex 41 42 43 # Python data = b"ABC" hex_encoded = data.hex() # '414243' decoded = bytes.fromhex('414243') # b'ABC'
Security Anti-Pattern

Base64 is not encryption. We regularly find applications that "protect" API keys, passwords, or sensitive configuration with Base64 encoding. This provides zero security—anyone can decode Base64 instantly without any key or secret. If you're Base64 encoding something to "hide" it, you're doing it wrong.

Encryption: Confidentiality Through Keys

Encryption in One Sentence

Encryption transforms readable data (plaintext) into unreadable data (ciphertext) using a key—only someone with the correct key can reverse the process.

Encryption provides confidentiality. It ensures that even if an attacker intercepts the data, they cannot read it without the key. Unlike encoding, encryption is specifically designed to resist unauthorized reversal.

Symmetric Encryption

Symmetric encryption uses the same key for encryption and decryption. It's fast and efficient for large amounts of data.

AES-256-GCM Encryption (Python)
from cryptography.hazmat.primitives.ciphers.aead import AESGCM import os # Generate a random 256-bit key (store this securely!) key = os.urandom(32) # 32 bytes = 256 bits # Encrypt plaintext = b"Sensitive data here" nonce = os.urandom(12) # 96-bit nonce for GCM aesgcm = AESGCM(key) ciphertext = aesgcm.encrypt(nonce, plaintext, None) # Decrypt decrypted = aesgcm.decrypt(nonce, ciphertext, None) # decrypted == b"Sensitive data here" # The ciphertext is meaningless without the key # Even with the algorithm known, brute-forcing 256-bit AES is infeasible

Key symmetric algorithms:

  • AES-256-GCM — Current gold standard. Authenticated encryption provides confidentiality AND integrity. Use this.
  • ChaCha20-Poly1305 — Modern alternative to AES. Faster in software without hardware acceleration. Used by WireGuard and TLS 1.3.
  • AES-256-CBC — Older mode. Requires separate MAC for integrity. Vulnerable to padding oracle attacks if implemented incorrectly.
Avoid Deprecated Algorithms

DES, 3DES, RC4, and Blowfish are obsolete. DES has a 56-bit key (breakable in hours). RC4 has known biases. If you see these in production code, they need to be replaced.

Asymmetric Encryption

Asymmetric encryption uses a key pair: a public key for encryption and a private key for decryption. Anyone can encrypt with your public key, but only you can decrypt with your private key.

RSA Encryption (Python)
from cryptography.hazmat.primitives.asymmetric import rsa, padding from cryptography.hazmat.primitives import hashes # Generate key pair private_key = rsa.generate_private_key( public_exponent=65537, key_size=4096 # Use 4096 bits for long-term security ) public_key = private_key.public_key() # Encrypt with public key (anyone can do this) plaintext = b"Secret message" ciphertext = public_key.encrypt( plaintext, padding.OAEP( mgf=padding.MGF1(algorithm=hashes.SHA256()), algorithm=hashes.SHA256(), label=None ) ) # Decrypt with private key (only key owner) decrypted = private_key.decrypt( ciphertext, padding.OAEP( mgf=padding.MGF1(algorithm=hashes.SHA256()), algorithm=hashes.SHA256(), label=None ) )

Key asymmetric algorithms:

  • RSA — The classic. Use 4096-bit keys for new applications. Slower than symmetric but solves the key distribution problem.
  • ECDH (Elliptic Curve Diffie-Hellman) — Key exchange protocol. Used to establish shared secrets for symmetric encryption.
  • ECIES — Elliptic curve encryption scheme. Combines ECDH key exchange with symmetric encryption.
Hybrid Encryption

In practice, asymmetric and symmetric encryption are combined. Asymmetric encryption is slow and limited in message size. The common pattern: generate a random symmetric key, encrypt the data with it (AES), then encrypt the symmetric key with the recipient's public key (RSA). This gives you the best of both worlds.

Critical Encryption Concepts

Initialization Vector (IV) / Nonce: A random value used to ensure the same plaintext encrypts to different ciphertext each time. Without it, attackers can detect repeated messages. Never reuse an IV with the same key.

Authenticated Encryption: Modes like GCM and ChaCha20-Poly1305 verify that ciphertext hasn't been tampered with. Plain CBC mode doesn't—an attacker can flip bits in ciphertext and produce valid (but corrupted) plaintext. Always use authenticated encryption.

Key Management: The algorithm doesn't matter if your keys are poorly managed. Keys should never be hardcoded, committed to git, or stored alongside encrypted data. Use dedicated secrets management (HashiCorp Vault, AWS KMS, etc.).

Hashing: One-Way Fingerprints

Hashing in One Sentence

A hash function takes input of any size and produces a fixed-size output (digest) that cannot be reversed—even tiny input changes produce completely different outputs.

Hashing is fundamentally different from encryption: you cannot recover the original input from a hash. This isn't a limitation—it's the point. Hashes verify integrity and store passwords without keeping the actual password.

Hash Function Properties

  1. Deterministic: Same input always produces same hash
  2. One-way: Cannot derive input from hash (preimage resistance)
  3. Collision resistant: Infeasible to find two inputs with the same hash
  4. Avalanche effect: Small input change = completely different hash
  5. Fixed output size: Regardless of input length
SHA-256 Hashing
import hashlib # Same input = same hash (deterministic) hash1 = hashlib.sha256(b"Hello").hexdigest() hash2 = hashlib.sha256(b"Hello").hexdigest() # hash1 == hash2 == '185f8db32271fe25f561a6fc938b2e264306ec304eda518007d1764826381969' # Tiny change = completely different hash (avalanche effect) hash3 = hashlib.sha256(b"hello").hexdigest() # lowercase 'h' # hash3 == '2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824' # No way to reverse the hash back to "Hello"

Common Hash Functions

Algorithm Output Size Status Use Case
MD5 128 bits Broken Checksums only (non-security)
SHA-1 160 bits Broken Legacy systems only
SHA-256 256 bits Secure General purpose, integrity
SHA-384/512 384/512 bits Secure High-security applications
SHA-3 Variable Secure Alternative to SHA-2
BLAKE2 Variable Secure Fast, modern applications
MD5 and SHA-1 Are Broken

Collision attacks against MD5 are trivial—you can generate colliding PDFs on a laptop. SHA-1 collision was demonstrated in 2017 (SHAttered attack). Never use MD5 or SHA-1 for security purposes. They're acceptable only for non-security checksums (file deduplication, cache keys).

Password Hashing: A Special Case

General-purpose hash functions (SHA-256, etc.) are too fast for passwords. An attacker with a GPU can compute billions of SHA-256 hashes per second. Password hashing algorithms are intentionally slow:

Password Hashing with bcrypt
import bcrypt # Hash a password (slow by design) password = b"user_password_123" salt = bcrypt.gensalt(rounds=12) # 2^12 iterations hashed = bcrypt.hashpw(password, salt) # b'$2b$12$LQv3c1yqBWVHxkd0LHAkCOYz6TtxMQJqhN8/X4.VTtYWBw6Ck1Wqa' # Verify a password if bcrypt.checkpw(password, hashed): print("Password matches") # The salt is embedded in the hash - no separate storage needed # Work factor (rounds=12) makes brute force infeasible

Password hashing algorithms:

  • Argon2id — Winner of the Password Hashing Competition (2015). Memory-hard, resistant to GPU/ASIC attacks. The current recommendation.
  • bcrypt — Battle-tested since 1999. CPU-hard with configurable work factor. Still excellent choice.
  • scrypt — Memory-hard. Designed to resist hardware attacks. Good but Argon2 is preferred.
  • PBKDF2 — NIST approved. Uses HMAC with many iterations. Acceptable but weaker than memory-hard alternatives.
Modern Password Hashing with Argon2
from argon2 import PasswordHasher ph = PasswordHasher( time_cost=3, # Number of iterations memory_cost=65536, # 64 MB memory usage parallelism=4 # Parallel threads ) # Hash hash = ph.hash("user_password_123") # $argon2id$v=19$m=65536,t=3,p=4$c29tZXNhbHQ$... # Verify try: ph.verify(hash, "user_password_123") print("Password valid") except: print("Invalid password")

HMAC: Keyed Hashing

HMAC (Hash-based Message Authentication Code) adds a secret key to the hash process. It provides integrity AND authenticity—only someone with the key can generate a valid HMAC.

HMAC Example
import hmac import hashlib key = b"shared_secret_key" message = b"Data to authenticate" # Generate HMAC mac = hmac.new(key, message, hashlib.sha256).hexdigest() # 'a4b9c8d7e6f5...' (64 hex chars for SHA-256) # Verify HMAC (constant-time comparison to prevent timing attacks) expected_mac = "a4b9c8d7e6f5..." if hmac.compare_digest(mac, expected_mac): print("Message is authentic and unmodified")

HMAC is used for API authentication (AWS Signature), JWT signatures, cookie integrity, and secure webhooks.

Common Mistakes and How to Fix Them

Mistake 1: Using Encoding for Security

Wrong
# "Hiding" an API key with Base64 api_key = base64.b64encode(b"sk_live_abc123xyz").decode() # Config file contains: 'c2tfbGl2ZV9hYmMxMjN4eXo=' # This is NOT protected - anyone can decode it
Right
# Use environment variables or secrets management import os api_key = os.environ['API_KEY'] # Or encrypt with a key stored separately from cryptography.fernet import Fernet key = os.environ['ENCRYPTION_KEY'] # Stored in secrets manager cipher = Fernet(key) encrypted_api_key = cipher.encrypt(b"sk_live_abc123xyz")

Mistake 2: Using Fast Hashes for Passwords

Wrong
# SHA-256 is too fast for passwords password_hash = hashlib.sha256(password.encode()).hexdigest() # Attacker can try billions of guesses per second
Right
# Use a password-specific algorithm import bcrypt password_hash = bcrypt.hashpw(password.encode(), bcrypt.gensalt(rounds=12)) # Each guess takes ~250ms - billions of guesses = centuries

Mistake 3: Encrypting Without Authentication

Wrong
# AES-CBC without MAC - vulnerable to bit-flipping attacks from Crypto.Cipher import AES cipher = AES.new(key, AES.MODE_CBC, iv) ciphertext = cipher.encrypt(pad(plaintext, 16)) # Attacker can modify ciphertext and produce valid (corrupted) plaintext
Right
# Use authenticated encryption (AES-GCM) from cryptography.hazmat.primitives.ciphers.aead import AESGCM aesgcm = AESGCM(key) ciphertext = aesgcm.encrypt(nonce, plaintext, associated_data) # Any tampering is detected during decryption

Mistake 4: Hardcoding Keys

Wrong
# Key in source code = key in git history forever ENCRYPTION_KEY = "super_secret_key_12345"
Right
# Key from environment or secrets manager import os ENCRYPTION_KEY = os.environ.get('ENCRYPTION_KEY') # Or from AWS Secrets Manager, HashiCorp Vault, etc.

Decision Flowchart

Use this to choose the right approach:

Which One Should I Use?

Need to transmit binary as text? → Encoding (Base64)
Need to hide data from unauthorized parties? → Encryption
Need to verify data hasn't changed? → Hashing (SHA-256)
Need to store passwords? → Password hashing (Argon2, bcrypt)
Need to verify data AND prove who sent it? → HMAC or digital signatures
Need to transmit data securely between parties? → TLS (which uses all of the above)

Conclusion

The confusion between encoding, encryption, and hashing leads to real vulnerabilities. Encoding provides no security. Encryption requires proper key management and authenticated modes. Hashing is one-way and requires special algorithms for passwords.

When in doubt:

  • Never use encoding for security
  • Use AES-256-GCM or ChaCha20-Poly1305 for encryption
  • Use SHA-256 or BLAKE2 for integrity checking
  • Use Argon2id or bcrypt for passwords
  • Never roll your own crypto

For security assessments that identify cryptographic weaknesses in your applications, contact Brickell Technologies.

Cryptography Encoding Encryption Hashing Security Fundamentals