☁️ AWS Storage Deep Dive for Backend Engineers
S3, EFS, FSx, EBS, and More — with Interview Q&A
🚀 Introduction
Storage is one of the foundational layers of any AWS architecture. Whether you’re building APIs, analytics pipelines, or distributed systems, choosing the right storage solution impacts performance, durability, cost, and scalability.
In backend interviews, AWS storage is a frequent topic, testing your ability to:
-
Choose the right storage service (object vs. file vs. block)
-
Optimize cost and performance
-
Understand durability, consistency, and security
Let’s explore S3, EFS, FSx, EBS, and other key services — along with interview questions and answers you should master.
🪣 1. Amazon S3 — Object Storage for the Cloud
🔹 Overview
Amazon Simple Storage Service (S3) stores data as objects within buckets. It’s designed for 99.999999999% (11 nines) durability and 99.99% availability.
Typical use cases include data lakes, backups, static website hosting, and ML datasets.
🔹 Key Features
| Concept | Description |
|---|---|
| Buckets | Global containers for objects |
| Objects | Data + metadata stored in buckets |
| Storage Classes | Standard, Intelligent-Tiering, Glacier, Deep Archive |
| Versioning | Retain multiple versions of objects |
| Lifecycle Policies | Automate data movement or deletion |
| Encryption | SSE-S3, SSE-KMS, or client-side |
| Access Control | IAM policies, bucket policies, ACLs |
🧠 Interview Questions & Answers
1️⃣ What is the durability and availability of S3 Standard?
→ Durability: 99.999999999% (11 nines)
→ Availability: 99.99% per year
Durability means data loss is extremely rare even during regional outages.
2️⃣ What’s the difference between S3 Standard, S3 Infrequent Access, and Glacier?
| Tier | Use Case | Retrieval | Cost |
|---|---|---|---|
| Standard | Frequently accessed data | Immediate | High |
| IA | Infrequent access | Milliseconds | Lower |
| Glacier | Archival | Minutes–hours | Very low |
3️⃣ How does S3 consistency work?
→ As of Dec 2020, S3 provides strong read-after-write consistency for all operations (PUT, DELETE, LIST).
4️⃣ What is multipart upload?
→ It’s a method to upload large files (>100MB) in parallel parts for speed and fault tolerance.
5️⃣ How to securely access S3 within a VPC?
→ Use VPC Endpoints (Gateway/Interface) to route S3 traffic internally without public internet.
✅ Best Practices
-
Enable S3 Intelligent-Tiering to optimize cost automatically.
-
Turn on Versioning + MFA Delete for data protection.
-
Use S3 Access Points for multi-tenant data access.
-
Enforce encryption at rest (SSE-KMS) and in transit (HTTPS).
📁 2. Amazon EFS — Elastic File Storage
🔹 Overview
Amazon Elastic File System (EFS) provides scalable, serverless, shared file storage for Linux-based workloads. It’s accessible across multiple EC2 instances, containers (ECS/EKS), and on-prem systems.
🔹 Key Features
| Feature | Description |
|---|---|
| Type | Network File System (NFS v4) |
| Performance Modes | General Purpose / Max I/O |
| Throughput Modes | Bursting / Provisioned |
| Availability | Regional, across multiple AZs |
| Encryption | KMS for at-rest, TLS for in-transit |
| Access Points | Managed entry points with per-app permissions |
🧠 Interview Questions & Answers
1️⃣ Difference between EBS and EFS?
→ EBS = block storage for single instance.
→ EFS = shared file storage accessible by many instances concurrently.
2️⃣ Can EFS be mounted by multiple EC2 instances?
→ Yes. That’s one of its biggest advantages over EBS.
3️⃣ How does EFS scale?
→ Automatically scales from MBs to PBs without manual provisioning.
4️⃣ Difference between bursting and provisioned throughput modes?
→ Bursting: Auto scales based on file size.
→ Provisioned: You pre-allocate throughput for consistent performance.
5️⃣ How to reduce EFS costs?
→ Use EFS Infrequent Access (IA) storage class for rarely accessed files.
✅ Best Practices
-
Use EFS One Zone for cost-optimized dev/test workloads.
-
Set up access points for app-specific isolation.
-
Integrate EFS with EKS PersistentVolumes for containers.
-
Monitor with CloudWatch metrics (I/O, throughput).
🧩 3. Amazon FSx — Managed File Systems
🔹 Overview
Amazon FSx offers fully managed versions of popular enterprise file systems:
-
FSx for Windows File Server – for SMB/Windows workloads
-
FSx for Lustre – for HPC workloads
-
FSx for NetApp ONTAP – for hybrid and snapshot-based workloads
-
FSx for OpenZFS – for Linux environments
🧠 Interview Questions & Answers
1️⃣ When to use FSx vs EFS?
→ EFS is Linux-based (NFS).
→ FSx is for specialized workloads (Windows, HPC, hybrid).
2️⃣ What is FSx for Lustre?
→ High-performance file system designed for compute-intensive workloads; can link directly with S3.
3️⃣ How does FSx integrate with S3?
→ FSx for Lustre can import data from S3 at startup and export results back — ideal for analytics pipelines.
4️⃣ What is FSx for Windows File Server used for?
→ Provides SMB access with Active Directory integration — suited for Windows apps like SAP or .NET.
5️⃣ What is SnapMirror in FSx for NetApp ONTAP?
→ Data replication feature for backup/disaster recovery between AWS and on-prem NetApp systems.
✅ Best Practices
-
Choose FSx type aligned with your OS/workload.
-
Enable encryption with KMS for compliance.
-
Use DataSync to move data between on-prem and FSx.
-
For analytics, pair FSx for Lustre with S3 buckets.
💾 4. Amazon EBS — Elastic Block Store
🔹 Overview
EBS provides block-level storage volumes for EC2 instances — like attaching virtual disks.
It’s ideal for databases, boot volumes, and transactional systems.
🧠 Interview Questions & Answers
1️⃣ Difference between EBS and EFS?
→ EBS = single EC2 instance block storage.
→ EFS = shared NFS file system for multiple instances.
2️⃣ What are EBS volume types?
| Type | Use Case |
|---|---|
| gp3 | General-purpose (balanced price/performance) |
| io2 | High IOPS databases |
| st1/sc1 | Throughput-optimized HDD for sequential I/O |
3️⃣ Can you detach and attach EBS volumes between instances?
→ Yes, within the same AZ. Supports live snapshots for backup.
4️⃣ How to improve EBS performance?
→ Use EBS-optimized instances, provisioned IOPS, or RAID 0 striping.
5️⃣ Is EBS replicated across AZs?
→ No, replication is within a single AZ (but durable). Use snapshots to S3 for cross-AZ/region backup.
✅ Best Practices
-
Use gp3 over gp2 for cost efficiency.
-
Automate snapshots using AWS Backup.
-
Encrypt volumes and snapshots using KMS.
-
Enable delete-on-termination for temporary volumes.
🧮 5. Supporting Services in AWS Storage Ecosystem
| Service | Type | Use Case |
|---|---|---|
| AWS Storage Gateway | Hybrid | Bridge on-prem storage to AWS |
| AWS Backup | Management | Centralized backup across S3, EFS, RDS, DynamoDB |
| AWS Snow Family | Data Transfer | Offline data migration at petabyte scale |
| AWS DataSync | Data Transfer | High-speed data transfer between on-prem and AWS |
| AWS Glacier / Deep Archive | Archival | Long-term data retention with low cost |
🧩 Real Interview Scenarios with Answers
Scenario 1:
You need to store 10TB of log data accessed occasionally for analytics.
→ Use S3 Standard-IA or Intelligent-Tiering with Athena for queries.
Scenario 2:
Multiple EC2s need shared configuration files.
→ Use EFS mounted via NFS across all EC2s.
Scenario 3:
A Windows application needs shared SMB file access.
→ Use FSx for Windows File Server integrated with AD.
Scenario 4:
A PostgreSQL database needs high IOPS block storage.
→ Use EBS io2 volumes with Provisioned IOPS.
Scenario 5:
You must replicate files between on-prem servers and AWS.
→ Use AWS DataSync or Storage Gateway (File Gateway).
🧭 Quick Comparison — Choosing the Right AWS Storage
| Use Case | Service | Type | Shared Access | Scalability | Typical Cost |
|---|---|---|---|---|---|
| Data lake, backups, static assets | S3 | Object | Yes | Unlimited | Low |
| Shared file system for Linux apps | EFS | File | Yes | Auto | Medium |
| Windows/HPC workloads | FSx | File | Yes | Configurable | Medium–High |
| Databases, boot disks | EBS | Block | No | Manual | Medium |
| Archival backups | Glacier | Object | No | High | Very Low |
🔐 Security and Compliance Checklist
-
✅ Enable encryption at rest (KMS) and in transit (TLS).
-
✅ Restrict access using IAM roles/policies.
-
✅ Use VPC endpoints for S3/EFS private access.
-
✅ Enable CloudTrail for audit logging.
-
✅ Implement least privilege access principles.
🧠 Key Takeaways
-
Understand Object vs. File vs. Block storage distinctions.
-
Remember S3 = scalability, EFS = shared Linux, FSx = Windows/HPC, EBS = block storage.
-
Know durability, availability, encryption, and pricing tiers.
-
Use lifecycle management to reduce cost automatically.
-
Be ready to explain design choices in scenario questions.
📘 Summary Table of Core Interview Facts
| Topic | Fact |
|---|---|
| S3 Durability | 99.999999999% |
| EFS Access | Concurrent EC2/EKS |
| FSx Variants | Windows, Lustre, ONTAP, OpenZFS |
| EBS Volume Scope | Single AZ |
| S3 Consistency | Strong read-after-write |
| Encryption | SSE-S3 / SSE-KMS / Client |
| Lifecycle Management | Automates storage transitions |
| Backup Automation | AWS Backup or Lambda |
| Data Transfer Tools | Snowball, DataSync, Transfer Family |
No comments:
Post a Comment