Operating Systems (OS)

1. Process vs. Thread vs. Program

Technical Definition

A Program is a passive entity (executable file) stored on a disk, while a Process is an active instance of a program in execution. A Thread is the smallest unit of execution within a process, sharing the same address space but having its own stack and registers.

Core Essentials

Program (Passive): A collection of instructions stored as an executable file on the Secondary Storage.
Process (Active): A program in execution, loaded into Primary Memory (RAM) with allocated resources.
Thread (LWP): The smallest schedulable unit; shares Code/Data/Heap with peer threads but maintains its own Registers & Stack.

Feature	Process	Thread
Memory	Isolated address space.	Shared address space (Heap).
Overhead	High (Context switching is heavy).	Low (Lightweight).
Communication	Requires IPC (Inter-Process Comm).	Communicates via shared memory.

Interview Spotlight

Interviewer Focus: Understanding the trade-off between isolation and speed. Multi-threading is preferred for highly concurrent tasks like web servers.

Q: Can a thread exist without a process?
A: No. A thread is an execution unit assigned within the context of a process.

A Word Processing application is a Process, while the spell-checker and auto-saver running inside it are Threads.

2. Process States & Lifecycle

Technical Definition

The Process State defines the current activity of a process as it moves through various stages of execution. The lifecycle typically follows a 5-state model: New, Ready, Running, Waiting, and Terminated.

Core Concept

New: The process is being created and its PCB (Process Control Block) is initialized.
Ready: The process is in main memory, waiting to be assigned to a CPU core.
Running: Instructions are being executed by the processor.
Waiting (Blocked): The process is waiting for some event like I/O completion or signal.
Terminated: The process has finished execution and OS resources are being reclaimed.

Interview Spotlight

Focus: State transitions, specifically the 'Ready to Running' (Dispatching) and 'Running to Waiting' (I/O event) paths.

Q: Does a process always go back to 'Ready' from 'Waiting'?
A: Yes. Once I/O completes, it moves to 'Ready' to wait for the scheduler's turn; it cannot jump directly to 'Running'.

When you click 'Search' in a browser, the process moves to Waiting for the network response before returning to Running to render results.

3. CPU Scheduling (Preemptive vs. Non-preemptive)

Technical Definition

CPU Scheduling is the process of deciding which process in the ready queue gets to use the CPU next. **Preemptive Scheduling** allows the OS to forcibly interrupt a running process, whereas **Non-preemptive Scheduling** lets a process run until it either finishes or voluntarily releases control.

Core Concept

Preemptive: Uses time quantums or priority interrupts; better for multi-user/interactive systems.
Non-preemptive: Consistent execution; simple to implement; risks "Starvation" and long response times.
Decision points: (1) Running to Waiting, (2) Running to Ready, (3) Waiting to Ready, (4) Terminating.

Type	Preemptive	Non-Preemptive
Resource Control	OS can snatch CPU at any time.	Process releases CPU voluntarily.
Response Time	Fast and predictable.	Can be high for small tasks.

Interview Spotlight

Focus: Criticality in Real-Time OS (RTOS). Preemption is mandatory for responsiveness.

Q: Is FCFS (First Come First Served) preemptive?
A: No. FCFS is pure non-preemptive; once a process takes the CPU, it keeps it.

Modern Windows or macOS kernels are Preemptive to ensure a background update doesn't freeze your mouse cursor.

4. Scheduling Algorithms (FCFS, SJF, SRTF, RR, Priority)

Technical Definition

Scheduling Algorithms are specific methods used by the Short-Term Scheduler to organize the Ready Queue for CPU execution. These aim to maximize CPU Utilization and minimize Waiting Time and Response Time.

Core Concept

FCFS: Simple; Non-preemptive; suffers from Convoy Effect.
SJF (Shortest Job First): Optimal for average waiting time; risks starvation for long jobs.
SRTF (Shortest Remaining Time First): Preemptive version of SJF.
Round Robin (RR): Preemptive; uses a fixed Time Quantum ($q$); fair for all processes.
Priority Sched: Assigned based on importance; risk of Starvation (solved by Aging).

Interview Spotlight

Focus: Calculating Turnaround Time ($TAT = Completion - Arrival$) and Waiting Time ($WT = TAT - Burst$).

Q: What happens in RR if the Time Quantum is very large?
A: It degrades into simple FCFS.

Time-sharing cloud servers use Round Robin so every user gets a slice of processing power every second.

5. Process Synchronization (Critical Section, Race Condition)

Technical Definition

Process Synchronization is the coordination of concurrent processes to ensure consistent data when accessing shared resources. A Race Condition occurs when the output depends on the execution sequence, and a Critical Section is the code block where shared data is accessed.

Core Concept

Race Condition: Unpredictable outcome (e.g., both threads increment a counter simultaneously).
Solutions must satisfy: (1) Mutual Exclusion, (2) Progress, (3) Bounded Waiting.

Interview Spotlight

Focus: Defining the "Bounded Waiting" condition—ensuring a process doesn't wait indefinitely to enter its Critical Section.

Q: Is Software-only Peterson's Solution enough for modern Multi-core CPUs?
A: No. Memory ordering and compiler optimizations usually break purely software sync on modern hardware.

E-commerce Inventory: Two users buying the last item at the same time requires Sync to prevent overselling.

6. Semaphores vs. Mutex

Technical Definition

A Mutex (Mutual Exclusion) is a locking mechanism that allows only one thread to access a resource at a time, while a Semaphore is a signaling tool (integer variable) that controls access to a resource with limited instances.

Core Concept

Mutex: Locking object; must be released by the same thread that locked it (Ownership).
Binary Semaphore: Value is 0 or 1; similar to Mutex but has no ownership.
Counting Semaphore: Value $N > 1$; allows multiple instances of a resource (e.g., 5 printer ports).

Feature	Mutex	Semaphore
Type	Locking Mechanism.	Signaling Mechanism.
Ownership	Thread that locks must unlock.	Any process can signal.

Interview Spotlight

Focus: Understanding that a Binary Semaphore is NOT exactly a Mutex due to the 'Ownership' property.

A 1-person bathroom uses a Mutex; a 5-car parking lot uses a Counting Semaphore.

7. Classical Sync Problems (Producer-Consumer, Dining Philosophers)

Technical Definition

Traditional synchronization challenges that represent real-world concurrency issues. The **Producer-Consumer** problem deals with buffer overflows/underflows, while **Dining Philosophers** models resource allocation and deadlock hazards.

Core Concept

Producer-Consumer: Managed with 'Full', 'Empty', and 'Mutex' semaphores.
Dining Philosophers: 5 philosophers, 5 forks; if all pick left fork, Deadlock occurs.
Solution: Only allow 4 philosophers at the table or use asymmetric picking rules.

Interview Spotlight

Focus: Designing a deadlock-free approach for Dining Philosophers.

A high-speed logger filling a fixed-size buffer while a storage service writes to disk is a Producer-Consumer scenario.

8. Deadlock: Necessary Conditions (Coffman Conditions)

Technical Definition

A Deadlock is a state where a set of processes are blocked because each is holding a resource and waiting for another held by another process. For a deadlock to occur, **four specific conditions** must hold simultaneously.

Core Concept

Mutual Exclusion: Resource cannot be shared.
Hold and Wait: Process holds one resource while waiting for another.
No Preemption: Resources cannot be forcibly taken from a process.
Circular Wait: A chain of processes exists where each waits for a resource held by the next.

Interview Spotlight

Focus: Preventing Deadlock involves breaking at least ONE of these four conditions.

Two cars meeting in a 1-lane tunnel where neither can reverse is a Circular Wait deadlock.

9. Deadlock Handling (Prevention, Avoidance - Banker’s Algorithm)

Technical Definition

Methods used to manage deadlock risks. **Prevention** sets rules to ensure one Coffman condition is never met, while **Avoidance** (Banker's Algorithm) dynamically checks resource requests against "Safe States."

Core Concept

Prevention: e.g., Disallow 'Hold & Wait' by requiring all resources at start.
Avoidance: Requires knowledge of future resource needs.
Banker's Alg: Checks if granting a request leads to a state where everyone can eventually finish ($Available \ge Need$).

Interview Spotlight

Focus: Determining "Safe Seqence" in a Banker's problem. If a sequence exists, there is no deadlock.

Banks only lend money if they retain enough liquid cash to satisfy at least one priority client's withdrawal completely (**Banker's Logic**).

10. Paging & Segmentation

Technical Definition

Memory management schemes to eliminate fragmentation. **Paging** divides memory into fixed-sized blocks (Pages/Frames), whereas **Segmentation** divides memory into logical variable-sized modules based on program structure.

Core Concept

Paging: Totally transparent to programmer; simplifies allocation; uses Page Tables.
Segmentation: Programmer's view (Stack, Heap, Code); modules are logical.

Aspect	Paging	Segmentation
Fragmentation	Suffers from Internal.	Suffers from External.
Block Size	Fixed.	Variable.

Interview Spotlight

Focus: Why do we use 'Segmented Paging'? Answer: To get the logical modularity of segments with the easy allocation of pages.

Virtual memory on your PC uses **Paging** to swap 4KB chunks of inactive Chrome tabs to your SSD.

11. Virtual Memory & Demand Paging

Technical Definition

**Virtual Memory** allows execution of processes that are not completely in main memory by creating an abstraction. **Demand Paging** is the specific implementation where a page is only loaded into RAM when it is required during execution.

Core Concept

Page Fault: Occurs when CPU tries to access a page not in RAM; OS must fetch it from Disk.
Benefits: Larger address space, higher degree of multiprogramming, less I/O at startup.

Interview Spotlight

Focus: Defining the steps of handling a Page Fault (Trap OS → Locate on Disk → Swap into Frame → Reset Page Table).

You can play a 100GB game on a PC with 16GB RAM because of **Virtual Memory** swapping assets in real-time.

12. Page Replacement (FIFO, LRU, Optimal)

Technical Definition

Algorithms used to decide which page to remove from RAM when a new page needs to be loaded and all frames are full. They aim to minimize the **Page Fault Rate**.

Core Concept

FIFO: First In First Out; simple but suffers from **Belady's Anomaly**.
LRU: Least Recently Used; uses past history as an indicator of future (Good/Realistic).
Optimal: Replace page that won't be used for the longest time (Theoretical benchmark).

Interview Spotlight

Focus: LRU vs FIFO. LRU is better because it follows the 'Principle of Locality'.

Your browser's 'Back' button uses **LRU** logic to keep recently visited pages ready in cache.

13. Fragmentation (Internal vs. External)

Technical Definition

Wasted memory that prevents efficient utilization. **Internal Fragmentation** occurs when allocated memory is slightly larger than requested; **External Fragmentation** occurs when total free space exists but is not contiguous.

Core Concept

Internal: Happens in fixed-partitioning (Paging). Memory within a block is wasted.
External: Happens in variable-partitioning (Segmentation). Solved by **Compaction** or **Paging**.

Interview Spotlight

Focus: How to solve External Fragmentation. Answer: Compaction (shuffling memory) or Paging (splitting logical space).

A half-empty 4KB Page holding only 1KB of data is **Internal Fragmentation**.

14. Thrashing & Belady’s Anomaly

Technical Definition

Extreme performance degradations in memory management. **Thrashing** is when a system is busy swapping pages rather than executing; **Belady's Anomaly** is the counter-intuitive phenomenon where increasing frames increases page faults.

Core Concept

Thrashing: Happens when Degree of Multiprogramming is too high; solved by reducing processes.
Belady's: Only occurs in non-stack algorithms like FIFO.

Interview Spotlight

Focus: Recognizing Belady's Anomaly in a FIFO trace. Does LRU suffer from it? No, because LRU is a 'Stack Algorithm'.

An old PC freezing when you open too many Chrome tabs is undergoing **Thrashing**.

15. System Calls & Kernel (Monolithic vs. Micro)

Technical Definition

**System Calls** are the programmatic way a program requests service from the kernel. The **Kernel** is the core OS part; its design is either **Monolithic** (everything in one space) or **Micro** (minimal logic in kernel space).

Core Concept

Monolithic: Linux/Unix; very fast (syscalls stay in one space) but harder to maintain.
Microkernel: Mach/L4; highly secure and stable (crashes don't kill the whole OS) but slower.
Syscall examples: fork(), wait(), read(), write(), open().

Interview Spotlight

Focus: Difference between User Mode and Kernel Mode. Hardware prevents User apps from direct Hardware access.

When a program wants to save a file, it triggers a `write()` System Call to ask the Kernel for disk access.

16. RAID Levels

Technical Definition

**Redundant Array of Independent Disks (RAID)** is a technology used to combine multiple physical disk drives into a single logical unit for data redundancy, performance improvement, or both.

Core Concept

RAID 0 (Striping): Performance only; no redundancy. If one fails, all gone.
RAID 1 (Mirroring): Reliability; data copied to two disks.
RAID 5 (Parity): Distributed parity; balanced performance and safety.
RAID 10: Combined Striping + Mirroring.

Interview Spotlight

Focus: Why RAID is not a backup. Answer: RAID protects against Hardware failure, not accidental deletion.

Enterprise database servers use **RAID 10** to ensure maximum speed and zero data loss if a drive dies.

17. Disk Scheduling (SCAN, C-SCAN, LOOK)

Technical Definition

Algorithms used by the OS to manage the sequence of I/O requests to the Disk. They aim to reduce **Seek Time** (time to move the read/write head).

Core Concept

SCAN (Elevator): Head moves end to end, servicing requests like an elevator.
C-SCAN (Circular): Only services in one direction; snap-backs to start for fairness.
LOOK: Optimized SCAN; only goes as far as the last request in a direction.

Interview Spotlight

Focus: Which is more 'fair'? Answer: C-SCAN, as it avoids favoring the middle tracks.

Modern HDDs implement **LOOK** internally to avoid unnecessary head head movement.

18. Inter-Process Communication (Pipes, Shared Memory)

Technical Definition

Mechanisms that allow concurrent processes to communicate and synchronize. The two main models are **Shared Memory** (extremely fast) and **Message Passing** (easier for distributed systems).

Core Concept

Pipes: Unidirectional; used in `ls | grep`.
Named Pipes (FIFOs): Bi-directional; exist as files.
Shared Memory: Both processes mapped to same physical RAM; requires sync (Semaphore).

Interview Spotlight

Focus: Speed vs. Security trade-off between Shared Memory and Message Passing.

The `|` symbol in Linux terminals is a Pipe that connects the output of one process to the input of another.

19. Spooling vs. Buffering

Technical Definition

Techniques to bridge speed gaps between CPU and slow I/O. **Spooling** stores data in a large disk buffer for deferred processing; **Buffering** stores data temporarily in RAM to smooth out speed differences.

Core Concept

Spooling: Handles multiple simultaneous jobs (e.g. Printer Queue).
Buffering: Smooths input for a single process (e.g. Video streaming).

Aspect	Spooling	Buffering
Location	Disk.	RAM.
Concurrent	Managed overlapping jobs.	Managed single stream.

Interview Spotlight

Focus: Understanding that Spooling involves the Disk, while Buffering typically stays in RAM.

A **Printer Spooler** allows you to 'print' 5 documents while the printer is still warming up.

20. Real-Time OS (RTOS) basics

Technical Definition

An **RTOS** is an operating system where the correctness of a task depends not only on the logical result but also on the time at which it is delivered. Use **Deterministic** scheduling.

Core Concept

Hard RTOS: Strict deadlines; failure to meet = system crash (e.g. Airbag).
Soft RTOS: Deadlines important but minor delays allowed (e.g. Video streaming).
Key: High predictability and low interrupt latency.

Interview Spotlight

Focus: Hard vs Soft RTOS categorization of common devices.

Spacecraft and Car Airbags run on **Hard RTOS** because late computation is fatal.

Computer Networks (CN)

1. OSI Model (7 Layers & their protocols)

Technical Definition

The Open Systems Interconnection (OSI) Model is a conceptual framework that standardizes the functions of a telecommunication or computing system into seven logical layers. It ensures interoperability between diverse communication systems with standard protocols.

Core Concept

Layer 7 (Application): HTTP, DNS, SMTP; interface for end-user interaction.
Layer 6 (Presentation): SSL/TLS, JPEG; encryption & data formatting.
Layer 5 (Session): NetBIOS, PPTP; dialogue management & session control.
Layer 4 (Transport): TCP, UDP; process-to-process delivery & flow control.
Layer 3 (Network): IP, ICMP; routing & packet forwarding based on IP.
Layer 2 (Data Link): Ethernet, ARP; hop-to-hop delivery based on MAC address.
Layer 1 (Physical): Cables, Hubs; raw bit-stream transmission.

Interview Spotlight

Focus: Memorizing the exact order (All People Seem To Need Data Processing) and the 'Unit' of data at each layer (Data, Segment, Packet, Frame, Bits).

Q: What is the data unit at Layer 2?
A: Frame.

When you load a webpage, your browser starts at Layer 7 and the signal travels down to Layer 1 before entering the cable.

2. TCP/IP Model vs. OSI

Technical Definition

The TCP/IP Model is the functional implementation used in the modern internet, consisting of 4 (or 5) layers. Unlike OSI, it merges the top three layers into a single Application layer and combines the physical/data link layers in some versions.

Core Concept

Application Layer: Equates to OSI 5, 6, and 7.
Transport Layer: Equates to OSI 4; manages host-to-host communication.
Internet Layer: Equates to OSI 3; handles addressing and routing.
Network Access Layer: Equates to OSI 1 and 2; handles hardware interaction.

Aspect	OSI Model	TCP/IP Model
Layers	7 Layers (Theoretical).	4/5 Layers (Implementation).
Approach	Vertical/Strict.	Horizontal/Modular.

Interview Spotlight

Focus: Why is TCP/IP used over OSI? Answer: TCP/IP is protocol-specific and proven on the ARPANET/Internet; OSI is a general model.

Every router and smartphone uses TCP/IP to synchronize and transmit data globally.

3. TCP vs. UDP (Handshaking vs. Connectionless)

Technical Definition

**Transmission Control Protocol (TCP)** is a connection-oriented, reliable protocol that ensures data delivery via acknowledgments. **User Datagram Protocol (UDP)** is a connectionless, unreliable protocol that prioritizes speed over data integrity.

Core Concept

TCP: Heavyweight; uses Flow Control (Sliding Window); error correction; ordered data.
UDP: Lightweight; 'Fire and forget'; no handshake; unordered; low overhead.
Overhead: TCP Header (20 bytes) vs UDP Header (8 bytes).

Feature	TCP	UDP
Reliability	Guaranteed delivery.	Best-effort delivery.
Use Case	Emails (SMTP), Web (HTTP).	VOIP, Gaming, Streaming.

Interview Spotlight

Focus: Explaining why UDP is used for DNS. Answer: Low overhead makes single-packet queries extremely fast.

Downloading a PDF uses TCP (all bytes must be correct); playing 'PUBG' uses UDP (one dropped packet doesn't matter).

4. IP Addressing (IPv4 vs. IPv6, Classful vs. CIDR)

Technical Definition

**IP Addressing** is a unique numeric label assigned to each device in a network. **IPv4** uses 32-bit addresses (approx. 4 billion), while **IPv6** uses 128-bit addresses to handle the exponential growth of connected devices.

Core Concept

Classful: A (0-127), B (128-191), C (192-223); wasteful due to fixed network prefixes.
CIDR (Classless Inter-Domain Routing): Uses a suffix like `/24` to define the network portion flexibly.
IPv6: Hexadecimal notation (e.g., `2001:db8::1`); built-in security (IPSec).

Interview Spotlight

Focus: Calculating the number of hosts in a `/mask`. Formula: $2^{(32 - mask)} - 2$.

Q: What is the address 127.0.0.1?
A: Loopback address; used for a machine to communicate with itself.

Your WiFi router assigns you an IPv4 address like 192.168.1.5 domestically.

5. Subnetting & Supernetting

Technical Definition

**Subnetting** is the process of dividing a large network into smaller, manageable sub-networks to reduce congestion. **Supernetting (CIDR aggregation)** is the inverse, combining multiple smaller networks into a single large one to reduce routing table size.

Core Concept

Subnet Mask: Defines which part of IP is network vs host (e.g., 255.255.255.0).
Benefit: Improves security and reduces 'Broadcast Storms'.
CIDR notation: Represents the number of set bits in the mask.

$$Hosts = 2^{(\text{Host Bits})} - 2$$ $$Subnets = 2^{(\text{Borrowed Bits})}$$

Interview Spotlight

Focus: Why do we subtract 2 in the host formula? Answer: To exclude the Network ID (all 0s) and Broadcast Address (all 1s).

An office divides their network so the HR and IT departments have separate **Subnets** for security.

6. DNS, DHCP, and NAT

Technical Definition

Fundamental internet service protocols. **DNS** translates domain names to IPs; **DHCP** automatically assigns IPs to devices; **NAT** allows multiple private IPs to share one public IP.

Core Concept

DNS (Domain Name System): Uses UDP 53; utilizes a hierarchical Root -> TLD -> Authoritative tree.
DHCP (DORA process): Discover, Offer, Request, Acknowledge; uses UDP 67/68.
NAT (Network Address Translation): Found in routers; conserves IPv4 addresses.

Interview Spotlight

Focus: Explaining the DORA process in DHCP.

When you join a Starbucks WiFi, **DHCP** gives you an IP, and **NAT** lets you browse the web using the shop's single public address.

7. HTTP vs. HTTPS (The SSL/TLS Handshake)

Technical Definition

**HTTP** is the plain-text protocol for web browsing; **HTTPS** adds a layer of security by encrypting the data using **SSL/TLS**. HTTPS protects against "Man-in-the-Middle" (MITM) attacks.

Core Concept

SSL/TLS Handshake: (1) Client Hello, (2) Server Hello + Cert, (3) Key Exchange, (4) Cipher Spec Change.
Encryption: Uses Asymmetric encryption for key exchange and Symmetric for data transfer.
Port: HTTP (80) vs HTTPS (443).

Interview Spotlight

Focus: Why use asymmetric for the handshake but symmetric for data? Answer: Asymmetric is secure but slow; symmetric is very fast for high-volume data.

Banking websites strictly use **HTTPS** to ensure your password isn't visible on the public network.

8. Routing Protocols (Distance Vector vs. Link State)

Technical Definition

Algorithms used by routers to find the best path for data packet transmission. **Distance Vector** protocols choose paths based on hop count; **Link State** protocols build a complete map of the network for smarter decisions.

Core Concept

RIP (Distance Vector): Uses Bellman-Ford; max 15 hops; sends full table periodically.
OSPF (Link State): Uses Dijkstra; builds topology map; faster convergence.
BGP (Path Vector): The protocol that connects entire ISPs across the internet.

Type	Distance Vector	Link State
Metric	Hop count.	Cost/Bandwidth.
Convergence	Slow.	Fast.

Interview Spotlight

Focus: The 'Count-to-Infinity' problem in Distance Vector protocols and how split-horizon solves it.

Corporate networks use **OSPF** to automatically redirect traffic if a main fiber line is cut.

9. Error Control (CRC, Checksum, Parity)

Technical Definition

Mechanism at the Data Link/Transport layers to detect if bits were altered during transmission. **CRC (Cyclic Redundancy Check)** is the most robust method, treating the bit stream as a polynomial.

Core Concept

Checksum: Adds segments of data; 1's complement math; used in TCP/IP headers.
Parity Bit: Simple 1-bit check (Even/Odd parity); cannot detect 2-bit errors.
CRC: Binary division using a generator polynomial ($G(x)$).

Interview Spotlight

Focus: Calculating CRC given a divisor. Remainder must be all 0s at the receiver for success.

Every Ethernet frame includes a **CRC** trailer to discard corrupted packets locally.

10. Flow Control (Stop & Wait, Sliding Window)

Technical Definition

Mechanisms to ensure a fast sender doesn't overwhelm a slow receiver. It adjusts the rate of data flow between two nodes.

Core Concept

Stop and Wait: Sender sends 1 frame, waits for ACK. Inefficient for long distances.
Sliding Window: Sends multiple frames ($W$) before requiring single ACK.
Selective Repeat: Only corrupted frames are retransmitted.

$$\text{Efficiency (Stop \& Wait)} = \frac{1}{1 + 2k} \quad \text{where } k = \frac{\text{Prop. Delay}}{\text{Trans. Delay}}$$

Interview Spotlight

Focus: Difference between Go-Back-N and Selective Repeat. Answer: GBN resends all after the error; SR only the error.

Streaming a video uses **Sliding Window** to keep data flowing while waiting for previous acknowledgments.

11. Congestion Control (Leaky Bucket, Token Bucket)

Technical Definition

Network-wide traffic management to prevent node saturation. **Leaky Bucket** ensures a constant output rate (no bursts); **Token Bucket** allows bursts but limits the average rate.

Core Concept

Leaky Bucket: Traffic shaping; drops overflowing packets; rigid traffic flow.
Token Bucket: Tokens accumulate in a bucket; allows high-speed bursts if tokens exist.
TCP Congestion Control: Relies on 'Additive Increase Multiplicative Decrease' (AIMD).

Interview Spotlight

Focus: Difference between Flow Control (End-to-end) and Congestion Control (Network-wide).

ISPs use **Token Bucket** to let you enjoy high-speed page loads while capping your sustained download speed for a 50GB file.

12. Network Devices (Hub, Switch, Router, Gateway)

Technical Definition

Hardware components that build the network. **Hubs** broadcast, **Switches** filter (MAC-based), **Routers** connect networks (IP-based), and **Gateways** convert between dissimilar protocols.

Core Concept

Hub (L1): Dumb device; creates 'Collision Domains'.
Switch (L2): Intelligent; uses CAM table to map Ports to MACs.
Router (L3): Uses routing tables to determine cross-network paths.

Device	Layer	Unit
Hub	Layer 1	Bits.
Switch	Layer 2	Frames.
Router	Layer 3	Packets.

Interview Spotlight

Focus: Defining a 'Collision Domain'. Answer: A segment where data packets can collide. Switches eliminate collisions.

A **Switch** allows two people in the same office to send large files simultaneously without slowing each other down.

13. ARP & RARP

Technical Definition

Conversion protocols between Layer 2 and Layer 3 addresses. **Address Resolution Protocol (ARP)** finds a MAC address from a known IP; **Reverse ARP (RARP)** finds an IP from a known MAC.

Core Concept

ARP Request: Broadcasted to everyone on the LAN.
ARP Reply: Unicast (direct) to the requester.
Gratuitous ARP: Used to check for IP conflicts on startup.

Interview Spotlight

Focus: Is ARP a Layer 2 or Layer 3 protocol? Answer: It operates at the boundary, often called Layer 2.5.

When you ping 192.168.1.1, your computer first sends an **ARP** request to find your router's hardware MAC address.

14. ICMP (Ping & Traceroute)

Technical Definition

**Internet Control Message Protocol (ICMP)** is used by network devices to send error messages and operational information. It is the core of diagnostic tools like **Ping** and **Traceroute**.

Core Concept

Echo Request/Reply: Used by Ping to check reachability.
TTL Exceeded: Used by Traceroute to map every hop to a destination.
Host Unreachable: Sent by router when it cannot find a route.

Interview Spotlight

Focus: How does Traceroute work using TTL? Answer: It increments TTL from 1, forcing each router to send an 'ICMP Timeout' back.

Running `ping google.com` sends **ICMP** Echo packets to check if your internet connection is alive.

15. Network Topologies (Mesh, Star, Bus)

Technical Definition

The geometric arrangement of devices in a network. **Mesh** is the most robust (everyone connected); **Star** is the most common (connected to a central hub/switch).

Core Concept

Mesh: Full mesh connection count = $n(n-1)/2$. High cost, high reliability.
Star: If central node fails, network dies. Easy to scale.
Bus: All share a single cable; high collision risk.

Topology	Robustness	Cost
Mesh	Very High.	High.
Star	Medium.	Low/Medium.

Interview Spotlight

Focus: Full Mesh cable calculation for 10 nodes. Answer: $10 \times 9 / 2 = 45$.

Home WiFi setups are **Star Topologies**, where your phone and laptop all connect to the central router.

16. MAC Address vs. IP Address

Technical Definition

**MAC Address** is a 48-bit permanent hardware identifier (Physical); **IP Address** is a 32/128-bit dynamic logical identifier (Logical).

Core Concept

MAC (L2): Assigned by manufacturer; unique globally; burned in NIC.
IP (L3): Assigned by network admin/DHCP; changes based on location.

Feature	MAC Address	IP Address
Layer	Data Link (L2).	Network (L3).
Mutability	Permanent.	Dynamic.

Interview Spotlight

Focus: Why do we need both? Answer: IP allows hierarchical grouping for global routing; MAC allows specific device identification on a local wire.

A **MAC address** is like your Social Security Number (permanent); an **IP address** is like your Current Mailing Address (changes when you move).

17. Firewalls & VPNs

Technical Definition

Network security tools. **Firewalls** filter incoming/outgoing traffic based on rules; **VPNs (Virtual Private Network)** create encrypted tunnels over public networks to hide data and location.

Core Concept

Firewall: Can be Packet Filtering, Stateful, or Next-Gen (Application aware).
VPN: Uses protocols like IPsec, OpenVPN, or L2TP to encapsulate data.

Interview Spotlight

Focus: Difference between stateless and stateful firewalls.

Working from home requires a **VPN** to access the company's internal files securely.

18. Socket Programming basics

Technical Definition

The API used by programmers to create end-to-end network communication. A **Socket** is an endpoint formed by combining an **IP Address** and a **Port Number**.

Core Concept

Stream Sockets: Use TCP (reliable).
Datagram Sockets: Use UDP (fast).
Primitives: socket(), bind(), listen(), accept(), connect().

Interview Spotlight

Focus: The sequence of calls for a Server vs. a Client.

Q: What is a socket?
A: An endpoint of a two-way communication link. IP:Port.

A Chat Application uses **Sockets** to bind your phone to a specific port on the server to receive messages.

19. 3-Way Handshake (SYN, SYN-ACK, ACK)

Technical Definition

The process used by TCP to establish a reliable connection before data transfer. It ensures both sides can send and receive data.

Core Concept

Step 1: Client sends **SYN** (Synchronize) + seq=x.
Step 2: Server sends **SYN-ACK** + seq=y + ack=x+1.
Step 3: Client sends **ACK** + ack=y+1.

Interview Spotlight

Focus: Why is it a 3-way handshake and not 2-way? Answer: To prevent 'Delayed Duplicate' connections from previous sessions.

Q: What is a SYN flood?
A: A Denial-of-Service attack where an attacker sends many SYNs but never sends the final ACK back.

Before your computer downloads a file, it performs a **3-Way Handshake** with the server to agree on sequence numbers.

20. SMTP, FTP, POP3, IMAP protocols

Technical Definition

Standard application-layer protocols. **SMTP** is for sending emails; **FTP** for file transfers; **POP3/IMAP** for retrieving emails from a mail server.

Core Concept

SMTP (25): Push protocol; sends mail between MTAs.
FTP (20/21): Uses dual ports (Control vs Data).
POP3 (110): Pull protocol; downloads and deletes from server (usually).
IMAP (143): Pull protocol; syncs multiple clients with the server.

Feature	POP3	IMAP
Sync	Local only.	Synced across devices.
Server Copy	Usually deleted.	Kept on server.

Interview Spotlight

Focus: Why IMAP is better for the modern world. Answer: Allows users to read the same email on a phone, tablet, and laptop.

When you send a CV, you use **SMTP**, and the recruiter reads it via **IMAP** on their phone.

Machine Learning (ML)

1. Supervised vs. Unsupervised vs. Reinforcement Learning

Technical Definition

**Supervised Learning** uses labeled data to train models for mapping inputs to known outputs. **Unsupervised Learning** finds hidden patterns in unlabeled data, while **Reinforcement Learning (RL)** trains an 'agent' to make decisions by rewarding positive actions and penalizing negative ones.

Core Concept

Supervised: Regression (Continuous) and Classification (Discrete). Requires Ground Truth.
Unsupervised: Clustering (K-Means) and Dimensionality Reduction (PCA). Finds structure.
Reinforcement: State-Action-Reward cycle. Maximizes cumulative reward over time.

Type	Input Data	Goal
Supervised	Labeled.	Predict Output.
Unsupervised	Unlabeled.	Find Patterns.
Reinforcement	Experience/Reward.	Maximize Reward.

Interview Spotlight

Focus: Identifying which paradigm to use for a new problem (e.g., Robot navigation = RL, Email Spam = Supervised).

A spam filter uses **Supervised Learning**; a customer segmenting tool (Amazon recommendations) uses **Unsupervised Learning**.

2. Linear & Logistic Regression

Technical Definition

**Linear Regression** predicts a continuous numeric value by fitting a straight line to data points. **Logistic Regression** is used for binary classification, predicting the probability (0 to 1) of an instance belonging to a specific class using the Signmoid function.

Core Concept

Linear: Goal is to minimize MSE (Mean Squared Error). $y = mx + c$.
Logistic: Maps $z$ to $[0, 1]$ using $\sigma(z) = \frac{1}{1 + e^{-z}}$.
Assumptions: Linear relationship, no multi-collinearity, independence of errors.

Interview Spotlight

Focus: Why call it 'Regression' if it's used for Classification? Answer: Because it uses a linear regression line internally before applying the Sigmoid threshold.

Predicting a house price uses **Linear Regression**; deciding if an email is 'Spam' or 'Inbox' uses **Logistic Regression**.

3. Bias-Variance Tradeoff

Technical Definition

**Bias** is the error introduced by simplifying a complex real-world problem (Underfitting). **Variance** is the error from being too sensitive to small fluctuations in training data (Overfitting).

Core Concept

High Bias: Model is too simple; misses trends (Underfit).
High Variance: Model is too complex; captures noise (Overfit).
Tradeoff: The goal is to find the 'Sweet Spot' in model complexity where Total Error is minimized.

$$\text{Total Error} = \text{Bias}^2 + \text{Variance} + \text{Irreducible Error}$$

Interview Spotlight

Focus: Visualizing the 'Bullseye' diagram. Tight cluster away from center = Low Variance, High Bias.

A linear model trying to fit circular data has **High Bias**; a complex polynomial fitting every outlier has **High Variance**.

4. Overfitting & Underfitting (Regularization)

Technical Definition

**Overfitting** happens when a model performs perfectly on training data but fails on unseen data. **Underfitting** occurs when the model is too simple to learn even the training data. **Regularization** techniques prevent overfitting by penalizing high weights.

Core Concept

Signals: Overfit = Low Train Error, High Test Error. Underfit = High Train Error, High Test Error.
Causes: Overfit = Too many features, too little data. Underfit = Too simple model.
Fix: Add a penalty term $\lambda$ to the Loss Function.

Interview Spotlight

Focus: Explaining how increasing training data helps solve Overfitting.

Memorizing questions for an exam is **Overfitting**; understanding the concepts is **Generalization**.

5. L1 (Lasso) vs. L2 (Ridge) Regularization

Technical Definition

Techniques that add a penalty to the loss function to prevent overfitting. **L2 (Ridge)** penalizes the square of weights; **L1 (Lasso)** penalizes the absolute value of weights and can drive them to exactly zero.

Core Concept

L1 (Lasso): Used for **Feature Selection**; creates 'Sparse' models.
L2 (Ridge): Keeps all features but shrinks their influence.
Elastic Net: A hybrid of both L1 and L2.

$$\text{Ridge Loss} = \text{Loss} + \lambda \sum w^2 \quad \text{Lasso Loss} = \text{Loss} + \lambda \sum |w|$$

Interview Spotlight

Focus: Why does Lasso cause sparsity? Answer: L1 penalty has 'corners' at the axes in the optimization space, forcing weights to zero.

If you have 100 features but only 10 matter, use **L1 (Lasso)** to automatically zero-out the useless 90.

6. Decision Trees & Random Forest

Technical Definition

A **Decision Tree** is a flowchart-like structure that splits data based on feature values. **Random Forest** is an ensemble method that creates multiple trees using bagging and averages their results to reduce variance.

Core Concept

Split Criteria: Entropy (Information Gain) or Gini Impurity.
Pruning: Trimming branches to prevent overfitting in deep trees.
Random Forest: Uses 'Feature Randomness' and 'Bootstrap Aggregation' (Bagging).

Interview Spotlight

Focus: Why is Random Forest better than a single Decision Tree? Answer: It reduces overfitting and the 'greedy' bias of individual trees.

A Single Tree helps you decide 'Should I play tennis?'; a **Random Forest** asks 100 people and takes the majority vote.

7. Support Vector Machines (SVM) & Kernels

Technical Definition

**SVM** finds the optimal 'Hyperplane' that creates the maximum margin between two classes. The **Kernel Trick** allows SVM to solve non-linear problems by mapping data into a higher-dimensional space where it becomes linearly separable.

Core Concept

Support Vectors: Data points closest to the hyperplane.
Margin: The distance between support vectors of different classes.
Kernels: RBF (Gaussian), Polynomial, Sigmoid.

Interview Spotlight

Focus: Defining 'Hard Margin' vs 'Soft Margin'. Soft margin allows some outliers to gain better generalization.

Image recognition tasks use **SVM** with RBF kernels to distinguish between complex shapes like hand-written digits.

8. K-Nearest Neighbors (KNN)

Technical Definition

A **Lazy Learning** algorithm that classifies a point based on the majority labels of its $K$ closest neighbors. It doesn't 'learn' a model; it simply stores the data and calculates distance at runtime.

Core Concept

Distance Metrics: Euclidean, Manhattan, Minkowski.
Hyperparameter $K$: Small $K$ = High Variance (Sensitive to noise). Large $K$ = High Bias.
Feature Scaling: Mandatory! Since it's distance-based, normalized features are essential.

Interview Spotlight

Focus: Calculating distance and explaining the effect of outliers on small vs large $K$ values.

A real-estate app uses **KNN** to suggest a house price by looking at the prices of the 5 nearest similar houses.

9. K-Means & Hierarchical Clustering

Technical Definition

**K-Means** is a centroid-based clustering algorithm that partitions data into $K$ exclusive clusters. **Hierarchical Clustering** builds a tree-like structure (Dendrogram) to represent nested clusters.

Core Concept

K-Means Steps: (1) Pick centroids, (2) Assign points, (3) Update centroids, (4) Repeat.
Selecting K: Use the **Elbow Method** (plotting Inertia vs K).
Hierarchical: Agglomerative (Bottom-up) vs Divisive (Top-down).

Clustering	K-Means	Hierarchical
Speed	Fast (O(n)).	Slow (O(n^2)).
Number of Clusters	Must be predefined.	Not required; use dendrogram.

Interview Spotlight

Focus: The Elbow Method and identifying why K-Means fails on non-spherical clusters.

Netflix uses **Clustering** to group users based on their viewing habits into 'Genre Tribes'.

10. Principal Component Analysis (PCA)

Technical Definition

**PCA** is a dimensionality reduction technique that transforms a large set of variables into a smaller one that still contains most of the original information. It finds new axes called **Principal Components** that maximize variance.

Core Concept

Variance: The goal is to retain components with high eigenvalues.
Steps: Standardize → Covariance Matrix → Eigenvectors/values → Sort & Project.
Orthogonality: All resulting components are independent (perpendicular).

Interview Spotlight

Focus: Ensuring people understand that PCA is UNSUPERVISED. It doesn't look at labels, only variance.

Reducing 100 census variables into 5 'Indices' for easier visualization is done via **PCA**.

11. Evaluation Metrics (Precision, Recall, F1, ROC-AUC)

Technical Definition

Quantitative measures to assess model performance. **Precision** measures accuracy of positive predictions; **Recall** measures the ability to find all positive instances. **F1 Score** is their harmonic mean.

Core Concept

Precision: $\frac{TP}{TP + FP}$. Minimized 'False Positives'.
Recall: $\frac{TP}{TP + FN}$. Minimized 'False Negatives'.
ROC Curve: Plots TPR vs FPR at different thresholds. **AUC** summarizes the curve area.

Interview Spotlight

Focus: When to prioritize Recall over Precision? Answer: Medical diagnosis (Cancer detection) or Disaster prediction.

A spam filter needs high **Precision** (don't block important mail); a COVID-19 test needs high **Recall** (find all infected).

12. Confusion Matrix & Type I/II Errors

Technical Definition

A **Confusion Matrix** is a table used to describe the performance of a classification model. **Type I Error** is a 'False Positive' (Rejected a true null), while **Type II Error** is a 'False Negative' (Accepted a false null).

Core Concept

Type I (False Positive): Alarm goes off when there's no fire.
Type II (False Negative): No alarm during an actual fire.
Accuracy: $\frac{TP + TN}{\text{Total}}$. Beware: Inaccurate for imbalanced datasets!

Interview Spotlight

Focus: Why is 'Accuracy' misleading for fraud detection? Answer: If 99% of transactions are safe, a model that says 'Always Safe' is 99% accurate but useless.

In a court trial, convicting an innocent person is a **Type I Error**; letting a guilty person go is a **Type II Error**.

13. Gradient Descent (Batch, SGD, Mini-batch)

Technical Definition

An optimization algorithm used to minimize the cost function by iteratively moving in the direction of the steepest descent. The **Learning Rate ($\eta$)** determines the step size.

Core Concept

Batch GD: Uses the whole dataset; slow but stable.
SGD (Stochastic): Uses 1 sample per step; fast but noisy (escapes local minima).
Mini-batch GD: Uses small chunks (e.g., 32-512); standard in DL.

$$w = w - \eta \cdot \nabla J(w)$$

Interview Spotlight

Focus: Explaining 'Vanishng Gradients' or finding the optimal learning rate. Too high = overshoot; too low = slow.

Training a massive model like GPT requires **Mini-batch GD** to fit processing requirements and maintain speed.

14. Ensemble Learning (Bagging vs. Boosting)

Technical Definition

Combining multiple weak learners to create a strong predictive model. **Bagging** reduces variance by parallel training (Random Forest); **Boosting** reduces bias by sequential training (XGBoost, AdaBoost).

Core Concept

Bagging: Bootstrapped Aggregation. Independence between models.
Boosting: Each new model focuses on correcting the errors of the previous ones.
Stacking: Using a 'Meta-model' to combine predictions of different algorithms.

Aspect	Bagging	Boosting
Training	Parallel.	Sequential.
Focus	Reduces Variance.	Reduces Bias.

Interview Spotlight

Focus: Identifying when to use Gradient Boosting over Random Forest (Boosting is usually better for tabular competition winning).

Predicting house market trends often uses **XGBoost (Boosting)** for the highest possible precision in time-series data.

15. Naive Bayes (Bayes Theorem)

Technical Definition

A probabilistic classifier based on **Bayes Theorem** with a 'Naive' assumption of total feature independence. Despite the simplistic assumption, it is highly effective for text data.

Core Concept

Bayes Theorem: $P(A|B) = \frac{P(B|A)P(A)}{P(B)}$.
Naive Assumption: Feature A has no effect on Feature B given the class.
Types: Multinomial (Text), Gaussian (Continuous), Bernoulli (Binary).

Interview Spotlight

Focus: Explaining why the 'Naive' assumption usually holds up in text classification (Word frequency patterns).

Your first spam filter in the early 2000s likely used **Naive Bayes** to check for words like 'Prize' or 'Urgent'.

16. Feature Engineering & Selection

Technical Definition

**Feature Engineering** is the art of creating new features to help the model learn (e.g., extracting 'Day of Week' from a timestamp). **Feature Selection** is pruning existing features to simplify the model and reduce noise.

Core Concept

Engineering: One-hot encoding, Scaling, Binning, Interaction features.
Selection: Filter methods (Correlation), Wrapper methods (RFE), Embedded (Lasso).

Interview Spotlight

Focus: Explaining the 'Curse of Dimensionality' and why more features aren't always better.

In a loan model, adding a feature 'Income-to-Debt Ratio' created from raw columns is **Feature Engineering**.

17. Cross-Validation (K-Fold)

Technical Definition

A resampling technique used to evaluate a model's performance on multiple subsets of data. **K-Fold CV** splits data into $K$ parts, training on $K-1$ and testing on 1, repeating this $K$ times.

Core Concept

Benefit: Minimizes bias introduced by a single fixed Train/Test split.
Standard K: Usually 5 or 10.
Stratified K-Fold: Keeps class proportions identical in each fold (Used for imbalanced data).

Interview Spotlight

Focus: When is K-Fold better than Hold-out? Answer: When the dataset is small and every scrap of data is needed for both training and validation.

Academic researchers use **K-Fold** to prove their model is truly robust across different samplings of a medical trial.

18. Curse of Dimensionality

Technical Definition

High-dimensional data (too many features) causes data points to become increasingly sparse, making distance-based algorithms like KNN or K-Means highly inaccurate and demanding massive amounts of training data.

Core Concept

Distance Problem: In high dimensions, the distance between any two points becomes nearly identical.
Overfitting: Models find noise patterns that don't generalize.
Solution: Dimensionality Reduction (PCA, Feature Selection).

Interview Spotlight

Focus: Explaining why Euclidean distance breaks in 10,000-dimensional space.

Searching for a specific book in a 1D shelf is easy; searching in a 1000-dimensional library where every book is equidistant is the **Curse**.

19. Hyperparameter Tuning (Grid vs. Random Search)

Technical Definition

Hyperparameters are settings defined before training (e.g., Learning Rate, K in KNN). **Grid Search** tries every combination in a predefined set, while **Random Search** samples combinations randomly for faster results.

Core Concept

Grid Search: Exhaustive and slow; guaranteed to find the 'best' in your provided grid.
Random Search: Statistically faster; more efficient for large search spaces.
Bayesian Optimization: Smarter tuning that learns from previous results to pick better parameters.

Interview Spotlight

Focus: Why is Random Search often preferred? Answer: Because many hyperparameters don't impact the result, and Random Search explores the 'important' ones better.

Before deploying an XGBoost model, engineers use **Random Search** to find the perfect 'Number of Trees' and 'Depth'.

20. Model Drift & Deployment (MLOps basics)

Technical Definition

**Model Drift** is the degradation of model performance over time because real-world data patterns change. **MLOps** comprises the practices of automating the deployment, monitoring, and maintenance of ML models.

Core Concept

Concept Drift: The statistical properties of the target changes (e.g., 'Spam' content evolves).
Data Drift: The input data distribution changes.
Monitoring: Tracking drift using metrics like Population Stability Index (PSI).

Interview Spotlight

Focus: How to fix Concept Drift? Answer: Periodic retraining with the most recent labeled data.

A shopping model trained before COVID-19 suffered from massive **Model Drift** when buying habits shifted suddenly in 2020.

Deep Learning (DL)

1. Perceptron & Multi-Layer Perceptron (MLP)

Technical Definition

A Perceptron is the simplest form of a neural network, consisting of a single layer of weights and a threshold activation function. A Multi-Layer Perceptron (MLP) is a feedforward artificial neural network consisting of at least three layers (input, hidden, and output) with non-linear activation functions.

Core Concept

Simplified Model: Computes a weighted sum of inputs and applies a step function.
MLP: Uses Backpropagation and Gradient Descent to update weights across multiple hidden layers.
Linear Separability: A single perceptron can only solve linearly separable problems (like AND/OR gates, not XOR).

Interview Spotlight

Focus: Explaining why a single perceptron cannot solve the XOR problem (Minsky and Papert, 1969).

Q: What is the purpose of a hidden layer in an MLP?
A: To allow the network to learn complex, non-linear mappings by projecting data into higher-dimensional feature spaces.

A simple **Perceptron** can decide if you should wear a coat based on temp > 10°C; an **MLP** can predict the exact temperature based on complex climate variables.

2. Activation Functions (ReLU, Sigmoid, Tanh, Softmax)

Technical Definition

**Activation Functions** are mathematical equations that determine the output of a neural network node. They introduce Non-linearity into the network, allowing it to learn complex patterns instead of acting as a simple linear transformation.

Core Concept

Sigmoid: Maps input to $[0, 1]$; prone to 'vanishing gradients'.
Tanh: Maps input to $[-1, 1]$; zero-centered, often better than Sigmoid.
ReLU (Rectified Linear Unit): Output is $max(0, x)$; prevents vanishing gradients; standard for hidden layers.
Softmax: Maps output to probabilities summing to 1; used in the final layer for multi-class classification.

Function	Range	Best Use Case
Sigmoid	(0, 1)	Binary Classification output.
ReLU	[0, Inf)	Hidden Layers in deep networks.
Softmax	(0, 1)	Multi-class classification output.

Interview Spotlight

Focus: The 'Dead ReLU' problem—where neurons output zero for all inputs if weights become too small. Solution: Leaky ReLU.

**ReLU** is used in almost all deep image recognition models to keep the training speed high and gradients healthy.

3. Forward & Backpropagation (Mathematical intuition)

Technical Definition

**Forward Propagation** is the process of passing input through the network to generate a prediction. **Backpropagation** is the method used to calculate the gradient of the loss function with respect to each weight by applying the Chain Rule, essentially moving 'errors' backward to update the model.

Core Concept

Forward Pass: Calculation of outputs layer by layer.
Error Calculation: Comparison of predicted output with ground truth using a Loss function.
Backward Pass: Calculation of partial derivatives (gradients) using the Chain Rule: $\frac{\partial L}{\partial w} = \frac{\partial L}{\partial a} \cdot \frac{\partial a}{\partial z} \cdot \frac{\partial z}{\partial w}$.

Interview Spotlight

Focus: Explaining how the Chain Rule is the engine of Deep Learning.

Q: Does Backpropagation update weights?
A: No. Backprop only calculates gradients. Optimization algorithms like SGD use those gradients to update weights.

**Backpropagation** is like a student checking the answer key and tracing back their mental mistakes to prepare for the next practice test.

4. Loss Functions (MSE, Cross-Entropy)

Technical Definition

**Loss Functions** quantify how 'wrong' the model's predictions are compared to the actual targets. **MSE (Mean Squared Error)** is used for regression, while **Cross-Entropy** is the standard for classification, measuring the difference between two probability distributions.

Core Concept

MSE: Penalizes large errors heavily; $\frac{1}{n}\sum(y_{true} - y_{pred})^2$.
Cross-Entropy: Highly sensitive to small differences in probabilities near 0 or 1; $- \sum y_{true} \log(y_{pred})$.
Binary Cross-Entropy (Log Loss): Used for 0/1 classification tasks.

Interview Spotlight

Focus: Why use Cross-Entropy instead of Accuracy for training? Answer: Accuracy is not differentiable; you need a smooth gradient to optimize.

Predicting stock prices uses **MSE**; identifying if an image is a dog or cat uses **Cross-Entropy**.

5. Vanishing & Exploding Gradient Problems

Technical Definition

Instabilities in deep neural networks during training. **Vanishing Gradients** occur when gradients become extremely small, causing early layers to stop learning. **Exploding Gradients** occur when gradients grow uncontrollably, leading to unstable weight updates and model 'diversion'.

Core Concept

Vanishing Cause: Repeated multiplication of small values (e.g., Sigmoid derivatives).
Exploding Cause: Multiplication of large weight values in deep layers.
Solutions: ReLU activation, Batch Normalization, Gradient Clipping, and Residual (Skip) Connections.

Interview Spotlight

Focus: Identifying how LSTM 'gates' specifically solve the vanishing gradient problem in sequence data.

Training a 100-layer network would be impossible without **Residual Connections (ResNet)** to bypass the vanishing gradient bottleneck.

6. CNN: Architecture (Convolution, Pooling, Padding)

Technical Definition

**Convolutional Neural Networks (CNNs)** are specialized for spatial data like images. They use **Filters (Kernels)** to detect features like edges and textures via mathematical convolution, followed by **Pooling** to reduce dimensionality and ensure translation invariance.

Core Concept

Convolution: Element-wise multiplication of a filter over a small patch of input.
Padding: Adding zero-pixels to the border to maintain image size after convolution.
Pooling (Max/Average): Down-sampling to retain only the most prominent features (e.g., 2x2 Max Pool).
Flattening: Converting 2D feature maps into a 1D vector for final classification.

Interview Spotlight

Focus: Why use CNNs instead of MLPs for images? Answer: Parameter efficiency. CNNs share weights across spatial regions (local receptive fields).

**CNNs** are the core technology behind Face ID on your iPhone and self-driving car vision systems.

7. RNN: Sequential data & Vanishing Gradients

Technical Definition

**Recurrent Neural Networks (RNNs)** are designed for sequential data (time-series, text) where inputs are dependent on previous states. They maintain a 'hidden state' or memory that updates at each time step, allowing them to process sequences of variable length.

Core Concept

Hidden State: $h_t = f(W \cdot h_{t-1} + U \cdot x_t)$.
BPTT (Backpropagation Through Time): The training method for RNNs.
Failure: Standard RNNs cannot remember long-range dependencies due to vanishing gradients across steps.

Interview Spotlight

Focus: Explaining why RNNs are 'unrolled' in time and how this leads to gradient instability.

Predicting the next word in a sentence (Auto-complete) used to be the primary job of **RNNs** before Transformers.

8. LSTM & GRU (Gating mechanisms)

Technical Definition

**Long Short-Term Memory (LSTM)** and **Gated Recurrent Units (GRU)** are advanced RNN architectures designed to solve the long-term dependency problem. They use **Gating Mechanisms** to selectively forget or remember information over long sequences.

Core Concept

LSTM Gates: (1) Forget Gate, (2) Input Gate, (3) Output Gate. Maintains a separate 'Cell State'.
GRU: Simplified version; combines forget and input gates into an 'Update Gate'.
Benefit: Gradient flow is preserved through the cell state, allowing memory over 1000+ steps.

Interview Spotlight

Focus: Comparing LSTM and GRU. Answer: GRU is faster and slightly simpler; LSTM is more flexible for very complex sequences.

Google Translate originally migrated to **LSTMs** to handle long-distance grammatical agreements in translation.

9. Weight Initialization (Xavier, He)

Technical Definition

**Weight Initialization** is the strategy for choosing starting values for neural network weights. Proper initialization ensures that the variance of activations and gradients stays stable across layers, preventing signals from disappearing or exploding instantly.

Core Concept

Xavier (Glorot) Init: Best for Sigmoid/Tanh. Balances variance based on fan_in and fan_out.
He Initialization: Best for **ReLU**. Uses a factor of $\sqrt{2/\text{fan\_in}}$ to account for ReLU's half-zero output.
Zero Init: A fatal mistake! Causes all neurons in a layer to perform the same calculation (Symmetry).

Interview Spotlight

Focus: Why not initialize all weights to zero? Answer: To maintain **Symmetry Breaking** so each neuron learns a unique feature.

Using **He Initialization** is a day-one best practice for any modern Deep Learning engineer building multi-layer CNNs.

10. Batch Normalization & Dropout
Technical Definition
Techniques to improve training stability and prevent overfitting. **Batch Normalization** re-scales the outputs of a layer to have zero mean and unit variance. **Dropout** randomly 'turns off' neurons during training, forcing the network to learn redundant representations.
Core Concept

Batch Norm: Reduces Internal Covariate Shift; allows higher learning rates.

Dropout: Prevents 'Co-adaptation' between neurons; only used during Training, not Inference.

Effect: Batch Norm accelerates training; Dropout generalizes the model better.

Interview Spotlight

Focus: Difference in Dropout behavior between Training and Testing. Answer: In testing, all neurons are active but outputs are scaled.

**Dropout** is like training a basketball team where someone is randomly pulled off the court—the remaining players must learn to play all positions to win.

11. Optimizers (Adam, RMSProp, Adagrad)
Technical Definition
**Optimizers** are algorithms that update weights to minimize loss. **Adam (Adaptive Moment Estimation)** is the current gold standard, combining the benefits of Momentum (speed through flat areas) and RMSProp (adapting learning rates for each parameter).
Core Concept

Adagrad: Lowers learning rate for frequent parameters; good for sparse data but decays too fast.

RMSProp: Fixes Adagrad's decay by using a moving average of squared gradients.

Adam: Most robust; uses both first and second moments of gradients to adjust steps.

Interview Spotlight

Focus: Why is Adam favored? Answer: It requires little hyperparameter tuning and handles noise/sparsity exceptionally well.

If you're unsure which optimizer to use, **Adam** is almost always the correct first choice in 95% of deep learning projects.

12. Transfer Learning
Technical Definition
**Transfer Learning** is a technique where a model developed for one task is reused as the starting point for a model on a second related task. It is highly effective when training data for the second task is limited.
Core Concept

Logic: Early layers of a CNN learn general features (edges, shapes) which are useful for almost all vision tasks.

Fine-tuning: Keeping early weights 'frozen' and training only the final classification layers.

Data Efficiency: Allows training high-quality models with as few as 100 images.

Interview Spotlight

Focus: Identifying when to 'unfreeze' more layers. Answer: If the new dataset is huge and very different from the original source.

Taking a model pre-trained on ImageNet (natural images) and using it to identify X-ray fractures is a classic **Transfer Learning** use case.

13. Autoencoders & GANs
Technical Definition
Unsupervised neural architectures. **Autoencoders** learn to compress input into a lower-dimensional latent space and then reconstruct it. **GANs (Generative Adversarial Networks)** use a 'Generator' and a 'Discriminator' in a zero-sum game to create hyper-realistic synthetic data.
Core Concept

Autoencoder: Encoder (Compression) + Decoder (Reconstruction). Used for Denoising/Anomaly Detection.

GAN Generator: Tries to create 'fake' data that fools the Discriminator.

GAN Discriminator: Tries to distinguish between real data and data from the Generator.

Interview Spotlight

Focus: Explaining 'Mode Collapse' in GANs—where the generator only learns to produce a single type of result instead of diversity.

**GANs** are responsible for creating Deepfakes, while **Autoencoders** are used by astronomers to find glitches in telescope data.

14. Attention Mechanism & Transformers
Technical Definition
**Attention** is a mechanism that allows a model to weigh the importance of different parts of input data relative to a specific context. **Transformers** use **Self-Attention** to process entire sequences in parallel, replacing sequential RNNs entirely.
Core Concept

Self-Attention: Calculates Query (What I want), Key (What I have), and Value (What I give).

Multi-Head Attention: Allows the model to focus on various aspects of context simultaneously.

Positional Encoding: Adds info about word order since Transformers process all words at once.

$$\text{Attention}(Q, K, V) = \text{softmax}\left(\frac{QK^T}{\sqrt{d_k}}\right)V$$

Interview Spotlight

Focus: Why are Transformers faster to train than RNNs? Answer: Parallelization. RNNs require step-by-step processing; Transformers do not.

The 'Attention' mechanism is the breakthrough that allowed AI like **ChatGPT** to understand complex, long-distance relationships in text.

15. BERT & GPT (High-level architecture)
Technical Definition
Modern Large Language Models (LLMs) based on Transformers. **BERT** is Encoder-only and 'bidirectional' (looks at both sides of a word). **GPT** is Decoder-only and 'unidirectional' (predicts the next word from left-to-right).
Core Concept

BERT: Pre-trained using Masked Language Modeling (MLM). Excellent for Understanding tasks (Search, Q&A).

GPT: Pre-trained using Causal Language Modeling. Excellent for Generation tasks.

Paradigm: Both rely on 'Pre-train → Fine-tune' workflows.

Model Direction Primary Strength

BERT Bi-directional. Classification & Information Retrieval.

GPT Uni-directional. Content Generation & Creative Writing.

Interview Spotlight

Focus: Explaining why BERT is better for sentiment analysis than GPT. Answer: It has full context of the words surrounding the adjectives.

Google Search uses **BERT** to understand your query intent, while **GPT** powers assistants that write code or essays for you.

16. Tensors & Frameworks (PyTorch vs. TensorFlow)
Technical Definition
**Tensors** are n-dimensional arrays (multi-dimensional matrices) which are the fundamental data structures in deep learning. **PyTorch** and **TensorFlow** are the primary frameworks used to build and train models on these tensors.
Core Concept

Tensor: Scalar (0D), Vector (1D), Matrix (2D), Tensor (3D+).

PyTorch: Dynamic computational graphs; very popular in Research (Pythonic).

TensorFlow: High scalability; robust for production deployment (Keras integration).

Feature PyTorch TensorFlow

Graph Type Dynamic (imperative). Static (traditionally).

Debugging Easy/Pythonic. Historically difficult (improving).

Interview Spotlight

Focus: Defining what a Tensor is in the context of DL. Answer: A container for data that carries a gradient history.

Most AI papers today use **PyTorch** for its flexibility, while large tech companies often use **TensorFlow** for massive data pipelines.

17. Object Detection (YOLO, R-CNN)
Technical Definition
**Object Detection** involves identifying 'What' is in an image and 'Where' it is located (Bounding Boxes). **R-CNN** uses region proposals (slow but accurate); **YOLO (You Only Look Once)** treats detection as a single regression problem (fastest).
Core Concept

Bounding Box: Represented as $(x, y, w, h)$.

IoU (Intersection over Union): Metric to evaluate box overlap accuracy.

Non-Max Suppression (NMS): Removes redundant duplicate boxes for the same object.

Interview Spotlight

Focus: Trade-off between inference speed and accuracy. Answer: YOLO is for real-time; Faster R-CNN is for high-precision medical/scientific images.

Self-driving cars use **YOLOv8** to detect pedestrians and street signs in milliseconds while driving.

18. NLP: Tokenization, Word Embeddings (Word2Vec)
Technical Definition
**Tokenization** is splitting text into units (words/sub-words). **Word Embeddings** map these tokens to continuous vectors where similar-meaning words are spatially close in high-dimensional space.
Core Concept

Word2Vec: Uses Skip-gram or CBOW to learn context.

Embedding Properties: Captures semantics (e.g., King - Man + Woman = Queen).

Sub-word Tokenization: Used by models like BERT (WordPiece) to handle 'Out-of-vocabulary' words.

Interview Spotlight

Focus: Explaining why we use vectors instead of simple integer IDs for words. Answer: Integers imply a false numerical relationship (e.g., Cat=1, Dog=2); vectors define semantic similarity.

**Word Embeddings** are why your phone's keyboard knows that if you type 'Happy', it should suggest 'Birthday' next.

19. Fine-tuning vs. RAG (Retrieval Augmented Generation)
Technical Definition
Techniques to optimize LLMs for specific knowledge. **Fine-tuning** updates the actual model weights on new data. **RAG** provides a search-engine context to the prompt, allowing the model to 'read' the correct info before answering without weight updates.
Core Concept

Fine-tuning: For style and domain adaptation. Expensive and static.

RAG: For factual accuracy and proprietary data. Dynamic and verifiable (provides citations).

Vector Database: Stores the external knowledge used by RAG.

Method Knowledge Type Cost

Fine-tuning Internalized skills/style. High (GPU intensive).

RAG External facts/documents. Low (Search intensive).

Interview Spotlight

Focus: When is RAG better than fine-tuning? Answer: When the data changes frequently (e.g., Daily news or real-time inventory).

A legal AI uses **RAG** to look up specific case law in a 5000-page PDF before summarizing it for a lawyer.

20. Hardware Acceleration (GPU/TPU for DL)
Technical Definition
**Hardware Accelerators** are specialized chips designed for the high-volume matrix multiplication involved in deep learning. **GPUs** are versatile parallel processors; **TPUs (Tensor Processing Units)** are ASICs custom-built by Google for tensor math.
Core Concept

Parallelism: A CPU has few powerful cores; a GPU has thousands of simple cores for simultaneous math.

VRAM: Memory on the GPU where tensors are stored during training.

CUDA (NVIDIA): The primary software layer that connects code to GPU hardware.

Interview Spotlight

Focus: Why are GPUs faster for Deep Learning? Answer: Matrix operations can be done in parallel; CPUs compute sequentially.

Training a modern LLM without **H100 GPUs** or **TPUs** would take decades instead of weeks.

Model	Direction	Primary Strength
BERT	Bi-directional.	Classification & Information Retrieval.
GPT	Uni-directional.	Content Generation & Creative Writing.

Feature	PyTorch	TensorFlow
Graph Type	Dynamic (imperative).	Static (traditionally).
Debugging	Easy/Pythonic.	Historically difficult (improving).

Method	Knowledge Type	Cost
Fine-tuning	Internalized skills/style.	High (GPU intensive).
RAG	External facts/documents.	Low (Search intensive).

Artificial Intelligence (AI)

1. Weak AI vs. Strong AI vs. GenAI
Technical Definition
**Weak AI (Narrow AI)** is designed to perform a single specific task (e.g., Siri). **Strong AI (AGI)** is a hypothetical machine with consciousness and human-level intelligence across all domains. **GenAI** is AI capable of creating original content (text, images, audio).
Core Concept

Narrow AI: Rule-based or pattern-matching; no general understanding.

AGI: Self-aware; can learn any mental task a human can.

Generative: Based on Probabilistic models (LLMs, Diffusion) to synthesize new data.

Interview Spotlight

Focus: Defining where current models like GPT-4 sit. Answer: They are advanced Narrow AI/Generative AI, not yet fully AGI.

**Weak AI** plays chess; **GenAI** writes a poem about why it lost at chess.

2. AI Agents & Environment Types
Technical Definition
An **AI Agent** is an autonomous entity that perceives its environment through sensors and acts upon it through actuators to achieve goals. Environments are classified as **Deterministic vs. Stochastic**, **Static vs. Dynamic**, and **Observable vs. Partially Observable**.
Core Concept

Simple Reflex: Acts based on current percept (If-Then rules).

Goal-Based: Considers future consequences of actions.

Utility-Based: Acts to maximize a performance measure (Happiness/Profit).

Interview Spotlight

Focus: Defining a 'Partially Observable' environment. Answer: Where the agent doesn't see the full state (e.g., Poker or Self-driving in fog).

A **Vacuum Robot** is a reflex agent; a **Trading Bot** is a utility-based agent.

3. Uninformed Search (BFS, DFS, Uniform Cost)
Technical Definition
**Uninformed Search (Blind Search)** algorithms have no information about the distance to the goal. **BFS (Breadth-First)** explores layer by layer; **DFS (Depth-First)** dives deep into one branch before backtracking.
Core Concept

BFS: Guaranteed optimal for unit-cost; high memory (Space $O(b^d)$).

DFS: Not optimal; memory-efficient (Space $O(bd)$).

Uniform Cost: Expands lowest cumulative cost node; equivalent to Dijkstra's.

Algorithm Completeness Optimality

BFS Complete. Optimal (unit cost).

DFS Complete (finite). Not Optimal.

Interview Spotlight

Focus: Why choose DFS? Answer: When memory is tight and paths are very deep but finite.

Mapping followers of followers on Instagram uses **BFS** to find 'Degrees of Separation'.

4. Informed Search (A*, Greedy Best First)
Technical Definition
**Informed Search** algorithms use a **Heuristic ($h$)** to estimate the cost to reach the goal. **A* Search** is the most widely used, combining actual cost from start ($g$) and heuristic estimate to goal ($h$).
Core Concept

A* Formula: $f(n) = g(n) + h(n)$.

Admissibility: A heuristic must never overestimate the cost for A* to be optimal.

Greedy BFS: Only considers $h(n)$; fast but can get stuck in loops and isn't optimal.

Interview Spotlight

Focus: Defining an 'Admissible Heuristic' (e.g., Straight-line distance is always admissible for road paths).

Google Maps uses **A*** variants to find the fastest route to your office while avoiding traffic.

5. Adversarial Search (Minimax, Alpha-Beta Pruning)
Technical Definition
Algorithms used in competitive multi-agent games. **Minimax** assumes the opponent will play optimally to minimize your score. **Alpha-Beta Pruning** optimizes Minimax by skipping branches that cannot possibly affect the final decision.
Core Concept

Max Node: Player trying to maximize the score.

Min Node: Opponent trying to minimize the score.

Pruning: Cuts search time in half without changing the final result.

Interview Spotlight

Focus: Manual tracing of a game tree to identify where pruning occurs (Alpha vs Beta thresholds).

Chess AI like Deep Blue uses **Minimax with Alpha-Beta** to look 20 moves ahead in seconds.

6. Knowledge Representation (FOPL, Semantic Nets)
Technical Definition
The way AI stores information about the world. **FOPL (First Order Predicate Logic)** uses symbols and quantifiers to represent facts ($\forall, \exists$). **Semantic Networks** represent knowledge as a graph of nodes (concepts) and edges (relationships).
Core Concept

FOPL: Precise and logical; powerful for inference engines.

Semantic Nets: Intuitive; captures 'ISA' (is-a) and 'HAS' (has-a) relationships.

Frames: Advanced structure where objects have slots/attributes.

Interview Spotlight

Focus: Translating English sentences (e.g., 'Every student likes AI') into FOPL notation ($\forall x [\text{Student}(x) \implies \text{Likes}(x, AI)]$).

**Semantic Nets** are the ancestor technology of modern Knowledge Graphs used by Google to show 'Related People'.

7. Forward vs. Backward Chaining
Technical Definition
Inference methods in expert systems. **Forward Chaining** starts with known facts and uses rules to derive new conclusions (Data-driven). **Backward Chaining** starts with a goal/hypothesis and works back to check if facts support it (Goal-driven).
Core Concept

Forward: If facts A and B are true, find what rules can fire. Good for diagnosis.

Backward: If we want to reach goal Z, find what premises (A, B) must be true. Good for planning.

Feature Forward Chaining Backward Chaining

Logic Data-driven. Goal-driven.

Search Breadth-first. Depth-first.

Interview Spotlight

Focus: Identifying which method a specific AI system uses based on available data.

An automation system checking 'Is the door locked?' (Goal) uses **Backward Chaining** to verify sensor inputs.

8. Expert Systems
Technical Definition
AI systems that mimic the decision-making ability of a human expert in a specific domain. They consist of a **Knowledge Base** (facts/rules) and an **Inference Engine** (reasoning logic).
Core Concept

MYCIN: Historical medical expert system for blood infections.

Logic: Usually relies on If-Then rules.

Limitation: Can't learn from experience; difficult to update 'Common Sense' knowledge.

Interview Spotlight

Focus: Explaining why Expert Systems were the dominant AI paradigm in the 1980s before Machine Learning took over.

A tax 소프트웨어 that asks you 20 questions to determine your refund is a modern **Expert System**.

9. Turing Test & Chinese Room Argument
Technical Definition
Philosophical foundations of intelligence. The **Turing Test** assesses if a machine can behave identically to a human. The **Chinese Room Argument (Searle)** posits that internal symbol manipulation doesn't equal genuine understanding.
Core Concept

Turing: Operational definition—if it acts smart, it is smart.

Chinese Room: A room can correctly translate Chinese via rules without the person inside knowing Chinese (Syntax $\neq$ Semantics).

Interview Spotlight

Focus: Does an LLM understand code, or is it just a 'Stochastic Parrot'? This relates directly to the Chinese Room.

Passing a 'Captcha' is a miniature, inverse **Turing Test** used to prove you are NOT an AI.

10. Fuzzy Logic
Technical Definition
A form of logic that deals with approximate reasoning rather than fixed truth. It uses **Degrees of Truth** (values between 0 and 1) instead of absolute binary (True/False).
Core Concept

Linguistic Variables: Terms like 'Warm', 'Slightly Cold', 'Very Hot'.

Fuzzification: Converting crisp inputs into fuzzy values.

Defuzzification: Converting fuzzy results back into actionable crisp outputs.

Interview Spotlight

Focus: Why use Fuzzy Logic for hardware control? Answer: Because physical sensors produce continuous, noisy values better handled with ranges.

Modern **Washing Machines** use Fuzzy Logic to determine exact water levels based on 'Small, Medium, or Large' load estimates.

11. Genetic Algorithms
Technical Definition
Metaheuristic search algorithms inspired by the process of natural selection. They evolve a population of solutions using **Selection, Crossover (Recombination), and Mutation** to find optimal results.
Core Concept

Fitness Function: Measures how 'good' a solution is.

Survival of Fittest: Best solutions reproduce to form the next generation.

Mutation: Prevents premature convergence by introducing random diversity.

Interview Spotlight

Focus: Explaining why GAs are used for 'Global Optimization' where gradient descent might get stuck in local minima.

Designing the most aerodynamic shape for an airplane wing often uses **Genetic Algorithms** to evolve a design over millions of simulated iterations.

12. Natural Language Processing (NLP) pipeline
Technical Definition
The series of steps used to transform raw text into a machine-readable format. It involves cleaning data and extracting meaningful semantic features.
Core Concept

Lowercasing & Stopword Removal: Removing 'the', 'is', etc.

Stemming/Lemmatization: Reducing words to roots (e.g., 'Running' -> 'Run').

Named Entity Recognition (NER): Identifying names, dates, and places.

POS Tagging: Identifying Nouns, Verbs, and Adjectives.

Interview Spotlight

Focus: Difference between Stemming (Rule-based chopping) and Lemmatization (Dictionary-based root finding).

Email services use **NLP Pipelines** to summarize your receipts into clear tabular entries in your calendar.

13. Constraint Satisfaction Problems (CSP)
Technical Definition
Problems defined by a set of variables, their domains, and a set of constraints that the solution must satisfy. **Backtracking** is the standard algorithmic approach used to solve CSPs.
Core Concept

Components: Variables, Domains (Possible values), Constraints (Rules).

Constraint Propagation: Pruning the domain of variables based on other assignments (e.g., ARC Consistency).

Optimal Solving: Combining Backtracking with heuristics like 'Minimum Remaining Values'.

Interview Spotlight

Focus: Identifying map-coloring or Sudoku as CSPs and explaining how 'Forward Checking' prevents dead-end paths.

University **Course Scheduling** is a massive CSP that ensures no two classes share the same room and professor at once.

14. Markov Decision Processes (MDP)
Technical Definition
A mathematical framework for modeling decision-making where outcomes are partly random and partly under the control of a decision-maker. It is the foundation of **Reinforcement Learning**.
Core Concept

Tuple $(S, A, P, R, \gamma)$: State, Action, Probability, Reward, Discount Factor.

Markov Property: The future is independent of the past, given the present.

Policy ($\pi$): A mapping from states to actions that maximizes total reward.

Interview Spotlight

Focus: Defining the 'Bellman Equation', which recursively calculates the value of the current state.

A robot navigating a grid with slippery floors (where move action might fail) is modeled as an **MDP**.

15. Heuristic Functions
Technical Definition
An estimation function used to guide search algorithms toward the goal node. It represents a 'rule of thumb' or 'common sense' shortcut that helps avoid exploring useless paths.
Core Concept

Admissible: Never overestimates (required for A* optimality).

Consistent: Follows the triangle inequality (satisfies monotonicity).

Examples: Manhattan Distance (Grid), Euclidean (Direct line), Hamming Distance (Puzzles).

Interview Spotlight

Focus: Designing a heuristic for the '8-Puzzle' problem. Answer: Number of misplaced tiles.

A delivery driver uses a **Heuristic** of 'Nearest Unvisited Customer' to plan their afternoon route.

16. Hill Climbing & Simulated Annealing
Technical Definition
Local search optimization algorithms. **Hill Climbing** iteratively moves toward better neighbors but often gets stuck in Local Maxima. **Simulated Annealing** allows occasional 'bad' moves to escape local peaks, inspired by physics-based cooling.
Core Concept

Hill Climbing: Greedy; fast; risk of 'Plateaus' and 'Ridges'.

Simulated Annealing: High temperature = more exploration; Low temperature = strictly better moves.

Convergence: Slow to find the global peak but much more reliable than simple Hill Climbing.

Interview Spotlight

Focus: Explaining why we allow 'worse' steps in Simulated Annealing. Answer: To escape local optima and find the true Global Maximum.

Optimizing the layout of components on a circuit board uses **Simulated Annealing** to minimize wire length.

17. Robotics in AI
Technical Definition
The intersection of AI and physical hardware. It involves the integration of **Perception** (Computer Vision, SLAM), **Planning** (A*, Pathfinding), and **Control** (PID, RL) to allow machines to interact with the real world.
Core Concept

SLAM (Simultaneous Localization and Mapping): Building a map while moving through it safely.

Kinematics: Mathematical modeling of joint movements (Forward vs Inverse).

Sensors: LiDAR, Ultrasound, Cameras (providing 'Percepts').

Interview Spotlight

Focus: Explaining the 'Sim-to-Real' gap in Reinforcement Learning for robots.

Warehouse robots like Amazon's **Kiva** use AI to coordinate paths for 1000s of units without colliding.

18. AI Ethics, Bias, and Safety
Technical Definition
The field focused on ensuring AI systems are developed responsibly. It addresses **Algorithmic Bias** (unfair treatments), **Safety** (preventing harmful actions), and **Transparency** (Explainable AI - XAI).
Core Concept

Dataset Bias: If training data is flawed (e.g., historical sexism), the AI will replicate it.

Alignment Problem: Ensuring AI goals match human values.

Adversarial Attacks: Tricking AI with subtle, invisible perturbations to input data.

Interview Spotlight

Focus: Defining 'Explainable AI'. Why is it critical for healthcare AI? Answer: Doctors need to know *why* a model diagnosed cancer to trust it.

Social media algorithms constantly undergo **Ethics Audits** to prevent the accidental promotion of harmful misinformation.

19. Agentic AI (Multi-agent orchestration)
Technical Definition
An advanced framework where multiple specialized AI agents collaborate to solve a complex task. Unlike a single model, **Agentic AI** involves reasoning, tool-use, and task-delegation between an 'Orchestrator' and 'Workers'.
Core Concept

Reasoning: Chains of thought (CoT) where agents plan before acting.

Tool-Use: Agents calling APIs (Calculators, Search, Document Readers) to get facts.

Multi-Agent: E.g., One agent researches, one writes, and one fact-checks the first two.

Interview Spotlight

Focus: Defining the difference between a simple Chatbot and an **Autonomous Agent**. Answer: Autonomy and step-by-step goal execution.

A system that reads your email, checks your calendar, books a flight, and confirms your hotel autonomously is **Agentic AI**.

20. Prompt Engineering as an AI tool
Technical Definition
The practice of optimizing input text (prompts) to guide Generative AI models toward more accurate and useful outputs. It involves techniques like **Few-shot Prompting**, **Chain-of-Thought**, and **System Persona** definition.
Core Concept

Few-shot: Providing 2-3 examples within the prompt.

Zero-shot: Asking a question directly without examples.

COT (Chain of Thought): Asking the model to "Think step-by-step" to improve logical reasoning.

Technique: "Act as a [Persona]" significantly changes model tone and constraints.

Interview Spotlight

Focus: Explaining why 'Prompt Engineering' is important for steering model behavior and reducing hallucinations.

A developer using **Chain-of-Thought** prompting to fix a bug gets a detailed explanation instead of just a code snippet.

Algorithm	Completeness	Optimality
BFS	Complete.	Optimal (unit cost).
DFS	Complete (finite).	Not Optimal.

Feature	Forward Chaining	Backward Chaining
Logic	Data-driven.	Goal-driven.
Search	Breadth-first.	Depth-first.