Operating Systems (OS)
1. Process vs. Thread vs. Program
Technical DefinitionA Program is a passive entity (executable file) stored on a disk, while a Process is an active instance of a program in execution. A Thread is the smallest unit of execution within a process, sharing the same address space but having its own stack and registers.
Core Essentials- Program (Passive): A collection of instructions stored as an executable file on the Secondary Storage.
- Process (Active): A program in execution, loaded into Primary Memory (RAM) with allocated resources.
- Thread (LWP): The smallest schedulable unit; shares Code/Data/Heap with peer threads but maintains its own Registers & Stack.
| Feature | Process | Thread |
|---|---|---|
| Memory | Isolated address space. | Shared address space (Heap). |
| Overhead | High (Context switching is heavy). | Low (Lightweight). |
| Communication | Requires IPC (Inter-Process Comm). | Communicates via shared memory. |
Interview Spotlight
Interviewer Focus: Understanding the trade-off between isolation and speed. Multi-threading is preferred for highly concurrent tasks like web servers.
A: No. A thread is an execution unit assigned within the context of a process.
A Word Processing application is a Process, while the spell-checker and auto-saver running inside it are Threads.
2. Process States & Lifecycle
Technical DefinitionThe Process State defines the current activity of a process as it moves through various stages of execution. The lifecycle typically follows a 5-state model: New, Ready, Running, Waiting, and Terminated.
Core Concept- New: The process is being created and its PCB (Process Control Block) is initialized.
- Ready: The process is in main memory, waiting to be assigned to a CPU core.
- Running: Instructions are being executed by the processor.
- Waiting (Blocked): The process is waiting for some event like I/O completion or signal.
- Terminated: The process has finished execution and OS resources are being reclaimed.
Interview Spotlight
Focus: State transitions, specifically the 'Ready to Running' (Dispatching) and 'Running to Waiting' (I/O event) paths.
A: Yes. Once I/O completes, it moves to 'Ready' to wait for the scheduler's turn; it cannot jump directly to 'Running'.
When you click 'Search' in a browser, the process moves to Waiting for the network response before returning to Running to render results.
3. CPU Scheduling (Preemptive vs. Non-preemptive)
Technical DefinitionCPU Scheduling is the process of deciding which process in the ready queue gets to use the CPU next. **Preemptive Scheduling** allows the OS to forcibly interrupt a running process, whereas **Non-preemptive Scheduling** lets a process run until it either finishes or voluntarily releases control.
Core Concept- Preemptive: Uses time quantums or priority interrupts; better for multi-user/interactive systems.
- Non-preemptive: Consistent execution; simple to implement; risks "Starvation" and long response times.
- Decision points: (1) Running to Waiting, (2) Running to Ready, (3) Waiting to Ready, (4) Terminating.
| Type | Preemptive | Non-Preemptive |
|---|---|---|
| Resource Control | OS can snatch CPU at any time. | Process releases CPU voluntarily. |
| Response Time | Fast and predictable. | Can be high for small tasks. |
Interview Spotlight
Focus: Criticality in Real-Time OS (RTOS). Preemption is mandatory for responsiveness.
A: No. FCFS is pure non-preemptive; once a process takes the CPU, it keeps it.
Modern Windows or macOS kernels are Preemptive to ensure a background update doesn't freeze your mouse cursor.
4. Scheduling Algorithms (FCFS, SJF, SRTF, RR, Priority)
Technical DefinitionScheduling Algorithms are specific methods used by the Short-Term Scheduler to organize the Ready Queue for CPU execution. These aim to maximize CPU Utilization and minimize Waiting Time and Response Time.
Core Concept- FCFS: Simple; Non-preemptive; suffers from Convoy Effect.
- SJF (Shortest Job First): Optimal for average waiting time; risks starvation for long jobs.
- SRTF (Shortest Remaining Time First): Preemptive version of SJF.
- Round Robin (RR): Preemptive; uses a fixed Time Quantum (\(q\)); fair for all processes.
- Priority Sched: Assigned based on importance; risk of Starvation (solved by Aging).
Interview Spotlight
Focus: Calculating Turnaround Time (\(TAT = Completion - Arrival\)) and Waiting Time (\(WT = TAT - Burst\)).
A: It degrades into simple FCFS.
Time-sharing cloud servers use Round Robin so every user gets a slice of processing power every second.
5. Process Synchronization (Critical Section, Race Condition)
Technical DefinitionProcess Synchronization is the coordination of concurrent processes to ensure consistent data when accessing shared resources. A Race Condition occurs when the output depends on the execution sequence, and a Critical Section is the code block where shared data is accessed.
Core Concept- Race Condition: Unpredictable outcome (e.g., both threads increment a counter simultaneously).
- Solutions must satisfy: (1) Mutual Exclusion, (2) Progress, (3) Bounded Waiting.
Interview Spotlight
Focus: Defining the "Bounded Waiting" condition—ensuring a process doesn't wait indefinitely to enter its Critical Section.
A: No. Memory ordering and compiler optimizations usually break purely software sync on modern hardware.
E-commerce Inventory: Two users buying the last item at the same time requires Sync to prevent overselling.
6. Semaphores vs. Mutex
Technical DefinitionA Mutex (Mutual Exclusion) is a locking mechanism that allows only one thread to access a resource at a time, while a Semaphore is a signaling tool (integer variable) that controls access to a resource with limited instances.
Core Concept- Mutex: Locking object; must be released by the same thread that locked it (Ownership).
- Binary Semaphore: Value is 0 or 1; similar to Mutex but has no ownership.
- Counting Semaphore: Value \(N > 1\); allows multiple instances of a resource (e.g., 5 printer ports).
| Feature | Mutex | Semaphore |
|---|---|---|
| Type | Locking Mechanism. | Signaling Mechanism. |
| Ownership | Thread that locks must unlock. | Any process can signal. |
Interview Spotlight
Focus: Understanding that a Binary Semaphore is NOT exactly a Mutex due to the 'Ownership' property.
A 1-person bathroom uses a Mutex; a 5-car parking lot uses a Counting Semaphore.
7. Classical Sync Problems (Producer-Consumer, Dining Philosophers)
Technical DefinitionTraditional synchronization challenges that represent real-world concurrency issues. The **Producer-Consumer** problem deals with buffer overflows/underflows, while **Dining Philosophers** models resource allocation and deadlock hazards.
Core Concept- Producer-Consumer: Managed with 'Full', 'Empty', and 'Mutex' semaphores.
- Dining Philosophers: 5 philosophers, 5 forks; if all pick left fork, Deadlock occurs.
- Solution: Only allow 4 philosophers at the table or use asymmetric picking rules.
Interview Spotlight
Focus: Designing a deadlock-free approach for Dining Philosophers.
A high-speed logger filling a fixed-size buffer while a storage service writes to disk is a Producer-Consumer scenario.
8. Deadlock: Necessary Conditions (Coffman Conditions)
Technical DefinitionA Deadlock is a state where a set of processes are blocked because each is holding a resource and waiting for another held by another process. For a deadlock to occur, **four specific conditions** must hold simultaneously.
Core Concept- Mutual Exclusion: Resource cannot be shared.
- Hold and Wait: Process holds one resource while waiting for another.
- No Preemption: Resources cannot be forcibly taken from a process.
- Circular Wait: A chain of processes exists where each waits for a resource held by the next.
Interview Spotlight
Focus: Preventing Deadlock involves breaking at least ONE of these four conditions.
Two cars meeting in a 1-lane tunnel where neither can reverse is a Circular Wait deadlock.
9. Deadlock Handling (Prevention, Avoidance - Banker’s Algorithm)
Technical DefinitionMethods used to manage deadlock risks. **Prevention** sets rules to ensure one Coffman condition is never met, while **Avoidance** (Banker's Algorithm) dynamically checks resource requests against "Safe States."
Core Concept- Prevention: e.g., Disallow 'Hold & Wait' by requiring all resources at start.
- Avoidance: Requires knowledge of future resource needs.
- Banker's Alg: Checks if granting a request leads to a state where everyone can eventually finish (\(Available \ge Need\)).
Interview Spotlight
Focus: Determining "Safe Seqence" in a Banker's problem. If a sequence exists, there is no deadlock.
Banks only lend money if they retain enough liquid cash to satisfy at least one priority client's withdrawal completely (**Banker's Logic**).
10. Paging & Segmentation
Technical DefinitionMemory management schemes to eliminate fragmentation. **Paging** divides memory into fixed-sized blocks (Pages/Frames), whereas **Segmentation** divides memory into logical variable-sized modules based on program structure.
Core Concept- Paging: Totally transparent to programmer; simplifies allocation; uses Page Tables.
- Segmentation: Programmer's view (Stack, Heap, Code); modules are logical.
| Aspect | Paging | Segmentation |
|---|---|---|
| Fragmentation | Suffers from Internal. | Suffers from External. |
| Block Size | Fixed. | Variable. |
Interview Spotlight
Focus: Why do we use 'Segmented Paging'? Answer: To get the logical modularity of segments with the easy allocation of pages.
Virtual memory on your PC uses **Paging** to swap 4KB chunks of inactive Chrome tabs to your SSD.
11. Virtual Memory & Demand Paging
Technical Definition**Virtual Memory** allows execution of processes that are not completely in main memory by creating an abstraction. **Demand Paging** is the specific implementation where a page is only loaded into RAM when it is required during execution.
Core Concept- Page Fault: Occurs when CPU tries to access a page not in RAM; OS must fetch it from Disk.
- Benefits: Larger address space, higher degree of multiprogramming, less I/O at startup.
Interview Spotlight
Focus: Defining the steps of handling a Page Fault (Trap OS → Locate on Disk → Swap into Frame → Reset Page Table).
You can play a 100GB game on a PC with 16GB RAM because of **Virtual Memory** swapping assets in real-time.
12. Page Replacement (FIFO, LRU, Optimal)
Technical DefinitionAlgorithms used to decide which page to remove from RAM when a new page needs to be loaded and all frames are full. They aim to minimize the **Page Fault Rate**.
Core Concept- FIFO: First In First Out; simple but suffers from **Belady's Anomaly**.
- LRU: Least Recently Used; uses past history as an indicator of future (Good/Realistic).
- Optimal: Replace page that won't be used for the longest time (Theoretical benchmark).
Interview Spotlight
Focus: LRU vs FIFO. LRU is better because it follows the 'Principle of Locality'.
Your browser's 'Back' button uses **LRU** logic to keep recently visited pages ready in cache.
13. Fragmentation (Internal vs. External)
Technical DefinitionWasted memory that prevents efficient utilization. **Internal Fragmentation** occurs when allocated memory is slightly larger than requested; **External Fragmentation** occurs when total free space exists but is not contiguous.
Core Concept- Internal: Happens in fixed-partitioning (Paging). Memory within a block is wasted.
- External: Happens in variable-partitioning (Segmentation). Solved by **Compaction** or **Paging**.
Interview Spotlight
Focus: How to solve External Fragmentation. Answer: Compaction (shuffling memory) or Paging (splitting logical space).
A half-empty 4KB Page holding only 1KB of data is **Internal Fragmentation**.
14. Thrashing & Belady’s Anomaly
Technical DefinitionExtreme performance degradations in memory management. **Thrashing** is when a system is busy swapping pages rather than executing; **Belady's Anomaly** is the counter-intuitive phenomenon where increasing frames increases page faults.
Core Concept- Thrashing: Happens when Degree of Multiprogramming is too high; solved by reducing processes.
- Belady's: Only occurs in non-stack algorithms like FIFO.
Interview Spotlight
Focus: Recognizing Belady's Anomaly in a FIFO trace. Does LRU suffer from it? No, because LRU is a 'Stack Algorithm'.
An old PC freezing when you open too many Chrome tabs is undergoing **Thrashing**.
15. System Calls & Kernel (Monolithic vs. Micro)
Technical Definition**System Calls** are the programmatic way a program requests service from the kernel. The **Kernel** is the core OS part; its design is either **Monolithic** (everything in one space) or **Micro** (minimal logic in kernel space).
Core Concept- Monolithic: Linux/Unix; very fast (syscalls stay in one space) but harder to maintain.
- Microkernel: Mach/L4; highly secure and stable (crashes don't kill the whole OS) but slower.
- Syscall examples: fork(), wait(), read(), write(), open().
Interview Spotlight
Focus: Difference between User Mode and Kernel Mode. Hardware prevents User apps from direct Hardware access.
When a program wants to save a file, it triggers a `write()` System Call to ask the Kernel for disk access.
16. RAID Levels
Technical Definition**Redundant Array of Independent Disks (RAID)** is a technology used to combine multiple physical disk drives into a single logical unit for data redundancy, performance improvement, or both.
Core Concept- RAID 0 (Striping): Performance only; no redundancy. If one fails, all gone.
- RAID 1 (Mirroring): Reliability; data copied to two disks.
- RAID 5 (Parity): Distributed parity; balanced performance and safety.
- RAID 10: Combined Striping + Mirroring.
Interview Spotlight
Focus: Why RAID is not a backup. Answer: RAID protects against Hardware failure, not accidental deletion.
Enterprise database servers use **RAID 10** to ensure maximum speed and zero data loss if a drive dies.
17. Disk Scheduling (SCAN, C-SCAN, LOOK)
Technical DefinitionAlgorithms used by the OS to manage the sequence of I/O requests to the Disk. They aim to reduce **Seek Time** (time to move the read/write head).
Core Concept- SCAN (Elevator): Head moves end to end, servicing requests like an elevator.
- C-SCAN (Circular): Only services in one direction; snap-backs to start for fairness.
- LOOK: Optimized SCAN; only goes as far as the last request in a direction.
Interview Spotlight
Focus: Which is more 'fair'? Answer: C-SCAN, as it avoids favoring the middle tracks.
Modern HDDs implement **LOOK** internally to avoid unnecessary head head movement.
18. Inter-Process Communication (Pipes, Shared Memory)
Technical DefinitionMechanisms that allow concurrent processes to communicate and synchronize. The two main models are **Shared Memory** (extremely fast) and **Message Passing** (easier for distributed systems).
Core Concept- Pipes: Unidirectional; used in `ls | grep`.
- Named Pipes (FIFOs): Bi-directional; exist as files.
- Shared Memory: Both processes mapped to same physical RAM; requires sync (Semaphore).
Interview Spotlight
Focus: Speed vs. Security trade-off between Shared Memory and Message Passing.
The `|` symbol in Linux terminals is a Pipe that connects the output of one process to the input of another.
19. Spooling vs. Buffering
Technical DefinitionTechniques to bridge speed gaps between CPU and slow I/O. **Spooling** stores data in a large disk buffer for deferred processing; **Buffering** stores data temporarily in RAM to smooth out speed differences.
Core Concept- Spooling: Handles multiple simultaneous jobs (e.g. Printer Queue).
- Buffering: Smooths input for a single process (e.g. Video streaming).
| Aspect | Spooling | Buffering |
|---|---|---|
| Location | Disk. | RAM. |
| Concurrent | Managed overlapping jobs. | Managed single stream. |
Interview Spotlight
Focus: Understanding that Spooling involves the Disk, while Buffering typically stays in RAM.
A **Printer Spooler** allows you to 'print' 5 documents while the printer is still warming up.
20. Real-Time OS (RTOS) basics
Technical DefinitionAn **RTOS** is an operating system where the correctness of a task depends not only on the logical result but also on the time at which it is delivered. Use **Deterministic** scheduling.
Core Concept- Hard RTOS: Strict deadlines; failure to meet = system crash (e.g. Airbag).
- Soft RTOS: Deadlines important but minor delays allowed (e.g. Video streaming).
- Key: High predictability and low interrupt latency.
Interview Spotlight
Focus: Hard vs Soft RTOS categorization of common devices.
Spacecraft and Car Airbags run on **Hard RTOS** because late computation is fatal.
Computer Networks (CN)
1. OSI Model (7 Layers & their protocols)
Technical DefinitionThe Open Systems Interconnection (OSI) Model is a conceptual framework that standardizes the functions of a telecommunication or computing system into seven logical layers. It ensures interoperability between diverse communication systems with standard protocols.
Core Concept- Layer 7 (Application): HTTP, DNS, SMTP; interface for end-user interaction.
- Layer 6 (Presentation): SSL/TLS, JPEG; encryption & data formatting.
- Layer 5 (Session): NetBIOS, PPTP; dialogue management & session control.
- Layer 4 (Transport): TCP, UDP; process-to-process delivery & flow control.
- Layer 3 (Network): IP, ICMP; routing & packet forwarding based on IP.
- Layer 2 (Data Link): Ethernet, ARP; hop-to-hop delivery based on MAC address.
- Layer 1 (Physical): Cables, Hubs; raw bit-stream transmission.
Interview Spotlight
Focus: Memorizing the exact order (All People Seem To Need Data Processing) and the 'Unit' of data at each layer (Data, Segment, Packet, Frame, Bits).
A: Frame.
When you load a webpage, your browser starts at Layer 7 and the signal travels down to Layer 1 before entering the cable.
2. TCP/IP Model vs. OSI
Technical DefinitionThe TCP/IP Model is the functional implementation used in the modern internet, consisting of 4 (or 5) layers. Unlike OSI, it merges the top three layers into a single Application layer and combines the physical/data link layers in some versions.
Core Concept- Application Layer: Equates to OSI 5, 6, and 7.
- Transport Layer: Equates to OSI 4; manages host-to-host communication.
- Internet Layer: Equates to OSI 3; handles addressing and routing.
- Network Access Layer: Equates to OSI 1 and 2; handles hardware interaction.
| Aspect | OSI Model | TCP/IP Model |
|---|---|---|
| Layers | 7 Layers (Theoretical). | 4/5 Layers (Implementation). |
| Approach | Vertical/Strict. | Horizontal/Modular. |
Interview Spotlight
Focus: Why is TCP/IP used over OSI? Answer: TCP/IP is protocol-specific and proven on the ARPANET/Internet; OSI is a general model.
Every router and smartphone uses TCP/IP to synchronize and transmit data globally.
3. TCP vs. UDP (Handshaking vs. Connectionless)
Technical Definition**Transmission Control Protocol (TCP)** is a connection-oriented, reliable protocol that ensures data delivery via acknowledgments. **User Datagram Protocol (UDP)** is a connectionless, unreliable protocol that prioritizes speed over data integrity.
Core Concept- TCP: Heavyweight; uses Flow Control (Sliding Window); error correction; ordered data.
- UDP: Lightweight; 'Fire and forget'; no handshake; unordered; low overhead.
- Overhead: TCP Header (20 bytes) vs UDP Header (8 bytes).
| Feature | TCP | UDP |
|---|---|---|
| Reliability | Guaranteed delivery. | Best-effort delivery. |
| Use Case | Emails (SMTP), Web (HTTP). | VOIP, Gaming, Streaming. |
Interview Spotlight
Focus: Explaining why UDP is used for DNS. Answer: Low overhead makes single-packet queries extremely fast.
Downloading a PDF uses TCP (all bytes must be correct); playing 'PUBG' uses UDP (one dropped packet doesn't matter).
4. IP Addressing (IPv4 vs. IPv6, Classful vs. CIDR)
Technical Definition**IP Addressing** is a unique numeric label assigned to each device in a network. **IPv4** uses 32-bit addresses (approx. 4 billion), while **IPv6** uses 128-bit addresses to handle the exponential growth of connected devices.
Core Concept- Classful: A (0-127), B (128-191), C (192-223); wasteful due to fixed network prefixes.
- CIDR (Classless Inter-Domain Routing): Uses a suffix like `/24` to define the network portion flexibly.
- IPv6: Hexadecimal notation (e.g., `2001:db8::1`); built-in security (IPSec).
Interview Spotlight
Focus: Calculating the number of hosts in a `/mask`. Formula: \(2^{(32 - mask)} - 2\).
A: Loopback address; used for a machine to communicate with itself.
Your WiFi router assigns you an IPv4 address like 192.168.1.5 domestically.
5. Subnetting & Supernetting
Technical Definition**Subnetting** is the process of dividing a large network into smaller, manageable sub-networks to reduce congestion. **Supernetting (CIDR aggregation)** is the inverse, combining multiple smaller networks into a single large one to reduce routing table size.
Core Concept- Subnet Mask: Defines which part of IP is network vs host (e.g., 255.255.255.0).
- Benefit: Improves security and reduces 'Broadcast Storms'.
- CIDR notation: Represents the number of set bits in the mask.
Interview Spotlight
Focus: Why do we subtract 2 in the host formula? Answer: To exclude the Network ID (all 0s) and Broadcast Address (all 1s).
An office divides their network so the HR and IT departments have separate **Subnets** for security.
6. DNS, DHCP, and NAT
Technical DefinitionFundamental internet service protocols. **DNS** translates domain names to IPs; **DHCP** automatically assigns IPs to devices; **NAT** allows multiple private IPs to share one public IP.
Core Concept- DNS (Domain Name System): Uses UDP 53; utilizes a hierarchical Root -> TLD -> Authoritative tree.
- DHCP (DORA process): Discover, Offer, Request, Acknowledge; uses UDP 67/68.
- NAT (Network Address Translation): Found in routers; conserves IPv4 addresses.
Interview Spotlight
Focus: Explaining the DORA process in DHCP.
When you join a Starbucks WiFi, **DHCP** gives you an IP, and **NAT** lets you browse the web using the shop's single public address.
7. HTTP vs. HTTPS (The SSL/TLS Handshake)
Technical Definition**HTTP** is the plain-text protocol for web browsing; **HTTPS** adds a layer of security by encrypting the data using **SSL/TLS**. HTTPS protects against "Man-in-the-Middle" (MITM) attacks.
Core Concept- SSL/TLS Handshake: (1) Client Hello, (2) Server Hello + Cert, (3) Key Exchange, (4) Cipher Spec Change.
- Encryption: Uses Asymmetric encryption for key exchange and Symmetric for data transfer.
- Port: HTTP (80) vs HTTPS (443).
Interview Spotlight
Focus: Why use asymmetric for the handshake but symmetric for data? Answer: Asymmetric is secure but slow; symmetric is very fast for high-volume data.
Banking websites strictly use **HTTPS** to ensure your password isn't visible on the public network.
8. Routing Protocols (Distance Vector vs. Link State)
Technical DefinitionAlgorithms used by routers to find the best path for data packet transmission. **Distance Vector** protocols choose paths based on hop count; **Link State** protocols build a complete map of the network for smarter decisions.
Core Concept- RIP (Distance Vector): Uses Bellman-Ford; max 15 hops; sends full table periodically.
- OSPF (Link State): Uses Dijkstra; builds topology map; faster convergence.
- BGP (Path Vector): The protocol that connects entire ISPs across the internet.
| Type | Distance Vector | Link State |
|---|---|---|
| Metric | Hop count. | Cost/Bandwidth. |
| Convergence | Slow. | Fast. |
Interview Spotlight
Focus: The 'Count-to-Infinity' problem in Distance Vector protocols and how split-horizon solves it.
Corporate networks use **OSPF** to automatically redirect traffic if a main fiber line is cut.
9. Error Control (CRC, Checksum, Parity)
Technical DefinitionMechanism at the Data Link/Transport layers to detect if bits were altered during transmission. **CRC (Cyclic Redundancy Check)** is the most robust method, treating the bit stream as a polynomial.
Core Concept- Checksum: Adds segments of data; 1's complement math; used in TCP/IP headers.
- Parity Bit: Simple 1-bit check (Even/Odd parity); cannot detect 2-bit errors.
- CRC: Binary division using a generator polynomial (\(G(x)\)).
Interview Spotlight
Focus: Calculating CRC given a divisor. Remainder must be all 0s at the receiver for success.
Every Ethernet frame includes a **CRC** trailer to discard corrupted packets locally.
10. Flow Control (Stop & Wait, Sliding Window)
Technical DefinitionMechanisms to ensure a fast sender doesn't overwhelm a slow receiver. It adjusts the rate of data flow between two nodes.
Core Concept- Stop and Wait: Sender sends 1 frame, waits for ACK. Inefficient for long distances.
- Sliding Window: Sends multiple frames (\(W\)) before requiring single ACK.
- Selective Repeat: Only corrupted frames are retransmitted.
Interview Spotlight
Focus: Difference between Go-Back-N and Selective Repeat. Answer: GBN resends all after the error; SR only the error.
Streaming a video uses **Sliding Window** to keep data flowing while waiting for previous acknowledgments.
11. Congestion Control (Leaky Bucket, Token Bucket)
Technical DefinitionNetwork-wide traffic management to prevent node saturation. **Leaky Bucket** ensures a constant output rate (no bursts); **Token Bucket** allows bursts but limits the average rate.
Core Concept- Leaky Bucket: Traffic shaping; drops overflowing packets; rigid traffic flow.
- Token Bucket: Tokens accumulate in a bucket; allows high-speed bursts if tokens exist.
- TCP Congestion Control: Relies on 'Additive Increase Multiplicative Decrease' (AIMD).
Interview Spotlight
Focus: Difference between Flow Control (End-to-end) and Congestion Control (Network-wide).
ISPs use **Token Bucket** to let you enjoy high-speed page loads while capping your sustained download speed for a 50GB file.
12. Network Devices (Hub, Switch, Router, Gateway)
Technical DefinitionHardware components that build the network. **Hubs** broadcast, **Switches** filter (MAC-based), **Routers** connect networks (IP-based), and **Gateways** convert between dissimilar protocols.
Core Concept- Hub (L1): Dumb device; creates 'Collision Domains'.
- Switch (L2): Intelligent; uses CAM table to map Ports to MACs.
- Router (L3): Uses routing tables to determine cross-network paths.
| Device | Layer | Unit |
|---|---|---|
| Hub | Layer 1 | Bits. |
| Switch | Layer 2 | Frames. |
| Router | Layer 3 | Packets. |
Interview Spotlight
Focus: Defining a 'Collision Domain'. Answer: A segment where data packets can collide. Switches eliminate collisions.
A **Switch** allows two people in the same office to send large files simultaneously without slowing each other down.
13. ARP & RARP
Technical DefinitionConversion protocols between Layer 2 and Layer 3 addresses. **Address Resolution Protocol (ARP)** finds a MAC address from a known IP; **Reverse ARP (RARP)** finds an IP from a known MAC.
Core Concept- ARP Request: Broadcasted to everyone on the LAN.
- ARP Reply: Unicast (direct) to the requester.
- Gratuitous ARP: Used to check for IP conflicts on startup.
Interview Spotlight
Focus: Is ARP a Layer 2 or Layer 3 protocol? Answer: It operates at the boundary, often called Layer 2.5.
When you ping 192.168.1.1, your computer first sends an **ARP** request to find your router's hardware MAC address.
14. ICMP (Ping & Traceroute)
Technical Definition**Internet Control Message Protocol (ICMP)** is used by network devices to send error messages and operational information. It is the core of diagnostic tools like **Ping** and **Traceroute**.
Core Concept- Echo Request/Reply: Used by Ping to check reachability.
- TTL Exceeded: Used by Traceroute to map every hop to a destination.
- Host Unreachable: Sent by router when it cannot find a route.
Interview Spotlight
Focus: How does Traceroute work using TTL? Answer: It increments TTL from 1, forcing each router to send an 'ICMP Timeout' back.
Running `ping google.com` sends **ICMP** Echo packets to check if your internet connection is alive.
15. Network Topologies (Mesh, Star, Bus)
Technical DefinitionThe geometric arrangement of devices in a network. **Mesh** is the most robust (everyone connected); **Star** is the most common (connected to a central hub/switch).
Core Concept- Mesh: Full mesh connection count = \(n(n-1)/2\). High cost, high reliability.
- Star: If central node fails, network dies. Easy to scale.
- Bus: All share a single cable; high collision risk.
| Topology | Robustness | Cost |
|---|---|---|
| Mesh | Very High. | High. |
| Star | Medium. | Low/Medium. |
Interview Spotlight
Focus: Full Mesh cable calculation for 10 nodes. Answer: \(10 \times 9 / 2 = 45\).
Home WiFi setups are **Star Topologies**, where your phone and laptop all connect to the central router.
16. MAC Address vs. IP Address
Technical Definition**MAC Address** is a 48-bit permanent hardware identifier (Physical); **IP Address** is a 32/128-bit dynamic logical identifier (Logical).
Core Concept- MAC (L2): Assigned by manufacturer; unique globally; burned in NIC.
- IP (L3): Assigned by network admin/DHCP; changes based on location.
| Feature | MAC Address | IP Address |
|---|---|---|
| Layer | Data Link (L2). | Network (L3). |
| Mutability | Permanent. | Dynamic. |
Interview Spotlight
Focus: Why do we need both? Answer: IP allows hierarchical grouping for global routing; MAC allows specific device identification on a local wire.
A **MAC address** is like your Social Security Number (permanent); an **IP address** is like your Current Mailing Address (changes when you move).
17. Firewalls & VPNs
Technical DefinitionNetwork security tools. **Firewalls** filter incoming/outgoing traffic based on rules; **VPNs (Virtual Private Network)** create encrypted tunnels over public networks to hide data and location.
Core Concept- Firewall: Can be Packet Filtering, Stateful, or Next-Gen (Application aware).
- VPN: Uses protocols like IPsec, OpenVPN, or L2TP to encapsulate data.
Interview Spotlight
Focus: Difference between stateless and stateful firewalls.
Working from home requires a **VPN** to access the company's internal files securely.
18. Socket Programming basics
Technical DefinitionThe API used by programmers to create end-to-end network communication. A **Socket** is an endpoint formed by combining an **IP Address** and a **Port Number**.
Core Concept- Stream Sockets: Use TCP (reliable).
- Datagram Sockets: Use UDP (fast).
- Primitives: socket(), bind(), listen(), accept(), connect().
Interview Spotlight
Focus: The sequence of calls for a Server vs. a Client.
A: An endpoint of a two-way communication link. IP:Port.
A Chat Application uses **Sockets** to bind your phone to a specific port on the server to receive messages.
19. 3-Way Handshake (SYN, SYN-ACK, ACK)
Technical DefinitionThe process used by TCP to establish a reliable connection before data transfer. It ensures both sides can send and receive data.
Core Concept- Step 1: Client sends **SYN** (Synchronize) +
seq=x. - Step 2: Server sends **SYN-ACK** +
seq=y+ack=x+1. - Step 3: Client sends **ACK** +
ack=y+1.
Interview Spotlight
Focus: Why is it a 3-way handshake and not 2-way? Answer: To prevent 'Delayed Duplicate' connections from previous sessions.
A: A Denial-of-Service attack where an attacker sends many SYNs but never sends the final ACK back.
Before your computer downloads a file, it performs a **3-Way Handshake** with the server to agree on sequence numbers.
20. SMTP, FTP, POP3, IMAP protocols
Technical DefinitionStandard application-layer protocols. **SMTP** is for sending emails; **FTP** for file transfers; **POP3/IMAP** for retrieving emails from a mail server.
Core Concept- SMTP (25): Push protocol; sends mail between MTAs.
- FTP (20/21): Uses dual ports (Control vs Data).
- POP3 (110): Pull protocol; downloads and deletes from server (usually).
- IMAP (143): Pull protocol; syncs multiple clients with the server.
| Feature | POP3 | IMAP |
|---|---|---|
| Sync | Local only. | Synced across devices. |
| Server Copy | Usually deleted. | Kept on server. |
Interview Spotlight
Focus: Why IMAP is better for the modern world. Answer: Allows users to read the same email on a phone, tablet, and laptop.
When you send a CV, you use **SMTP**, and the recruiter reads it via **IMAP** on their phone.
Machine Learning (ML)
1. Supervised vs. Unsupervised vs. Reinforcement Learning
Technical Definition**Supervised Learning** uses labeled data to train models for mapping inputs to known outputs. **Unsupervised Learning** finds hidden patterns in unlabeled data, while **Reinforcement Learning (RL)** trains an 'agent' to make decisions by rewarding positive actions and penalizing negative ones.
Core Concept- Supervised: Regression (Continuous) and Classification (Discrete). Requires Ground Truth.
- Unsupervised: Clustering (K-Means) and Dimensionality Reduction (PCA). Finds structure.
- Reinforcement: State-Action-Reward cycle. Maximizes cumulative reward over time.
| Type | Input Data | Goal |
|---|---|---|
| Supervised | Labeled. | Predict Output. |
| Unsupervised | Unlabeled. | Find Patterns. |
| Reinforcement | Experience/Reward. | Maximize Reward. |
Interview Spotlight
Focus: Identifying which paradigm to use for a new problem (e.g., Robot navigation = RL, Email Spam = Supervised).
A spam filter uses **Supervised Learning**; a customer segmenting tool (Amazon recommendations) uses **Unsupervised Learning**.
2. Linear & Logistic Regression
Technical Definition**Linear Regression** predicts a continuous numeric value by fitting a straight line to data points. **Logistic Regression** is used for binary classification, predicting the probability (0 to 1) of an instance belonging to a specific class using the Signmoid function.
Core Concept- Linear: Goal is to minimize MSE (Mean Squared Error). \(y = mx + c\).
- Logistic: Maps \(z\) to \([0, 1]\) using \(\sigma(z) = \frac{1}{1 + e^{-z}}\).
- Assumptions: Linear relationship, no multi-collinearity, independence of errors.
Interview Spotlight
Focus: Why call it 'Regression' if it's used for Classification? Answer: Because it uses a linear regression line internally before applying the Sigmoid threshold.
Predicting a house price uses **Linear Regression**; deciding if an email is 'Spam' or 'Inbox' uses **Logistic Regression**.
3. Bias-Variance Tradeoff
Technical Definition**Bias** is the error introduced by simplifying a complex real-world problem (Underfitting). **Variance** is the error from being too sensitive to small fluctuations in training data (Overfitting).
Core Concept- High Bias: Model is too simple; misses trends (Underfit).
- High Variance: Model is too complex; captures noise (Overfit).
- Tradeoff: The goal is to find the 'Sweet Spot' in model complexity where Total Error is minimized.
Interview Spotlight
Focus: Visualizing the 'Bullseye' diagram. Tight cluster away from center = Low Variance, High Bias.
A linear model trying to fit circular data has **High Bias**; a complex polynomial fitting every outlier has **High Variance**.
4. Overfitting & Underfitting (Regularization)
Technical Definition**Overfitting** happens when a model performs perfectly on training data but fails on unseen data. **Underfitting** occurs when the model is too simple to learn even the training data. **Regularization** techniques prevent overfitting by penalizing high weights.
Core Concept- Signals: Overfit = Low Train Error, High Test Error. Underfit = High Train Error, High Test Error.
- Causes: Overfit = Too many features, too little data. Underfit = Too simple model.
- Fix: Add a penalty term \(\lambda\) to the Loss Function.
Interview Spotlight
Focus: Explaining how increasing training data helps solve Overfitting.
Memorizing questions for an exam is **Overfitting**; understanding the concepts is **Generalization**.
5. L1 (Lasso) vs. L2 (Ridge) Regularization
Technical DefinitionTechniques that add a penalty to the loss function to prevent overfitting. **L2 (Ridge)** penalizes the square of weights; **L1 (Lasso)** penalizes the absolute value of weights and can drive them to exactly zero.
Core Concept- L1 (Lasso): Used for **Feature Selection**; creates 'Sparse' models.
- L2 (Ridge): Keeps all features but shrinks their influence.
- Elastic Net: A hybrid of both L1 and L2.
Interview Spotlight
Focus: Why does Lasso cause sparsity? Answer: L1 penalty has 'corners' at the axes in the optimization space, forcing weights to zero.
If you have 100 features but only 10 matter, use **L1 (Lasso)** to automatically zero-out the useless 90.
6. Decision Trees & Random Forest
Technical DefinitionA **Decision Tree** is a flowchart-like structure that splits data based on feature values. **Random Forest** is an ensemble method that creates multiple trees using bagging and averages their results to reduce variance.
Core Concept- Split Criteria: Entropy (Information Gain) or Gini Impurity.
- Pruning: Trimming branches to prevent overfitting in deep trees.
- Random Forest: Uses 'Feature Randomness' and 'Bootstrap Aggregation' (Bagging).
Interview Spotlight
Focus: Why is Random Forest better than a single Decision Tree? Answer: It reduces overfitting and the 'greedy' bias of individual trees.
A Single Tree helps you decide 'Should I play tennis?'; a **Random Forest** asks 100 people and takes the majority vote.
7. Support Vector Machines (SVM) & Kernels
Technical Definition**SVM** finds the optimal 'Hyperplane' that creates the maximum margin between two classes. The **Kernel Trick** allows SVM to solve non-linear problems by mapping data into a higher-dimensional space where it becomes linearly separable.
Core Concept- Support Vectors: Data points closest to the hyperplane.
- Margin: The distance between support vectors of different classes.
- Kernels: RBF (Gaussian), Polynomial, Sigmoid.
Interview Spotlight
Focus: Defining 'Hard Margin' vs 'Soft Margin'. Soft margin allows some outliers to gain better generalization.
Image recognition tasks use **SVM** with RBF kernels to distinguish between complex shapes like hand-written digits.
8. K-Nearest Neighbors (KNN)
Technical DefinitionA **Lazy Learning** algorithm that classifies a point based on the majority labels of its \(K\) closest neighbors. It doesn't 'learn' a model; it simply stores the data and calculates distance at runtime.
Core Concept- Distance Metrics: Euclidean, Manhattan, Minkowski.
- Hyperparameter \(K\): Small \(K\) = High Variance (Sensitive to noise). Large \(K\) = High Bias.
- Feature Scaling: Mandatory! Since it's distance-based, normalized features are essential.
Interview Spotlight
Focus: Calculating distance and explaining the effect of outliers on small vs large \(K\) values.
A real-estate app uses **KNN** to suggest a house price by looking at the prices of the 5 nearest similar houses.
9. K-Means & Hierarchical Clustering
Technical Definition**K-Means** is a centroid-based clustering algorithm that partitions data into \(K\) exclusive clusters. **Hierarchical Clustering** builds a tree-like structure (Dendrogram) to represent nested clusters.
Core Concept- K-Means Steps: (1) Pick centroids, (2) Assign points, (3) Update centroids, (4) Repeat.
- Selecting K: Use the **Elbow Method** (plotting Inertia vs K).
- Hierarchical: Agglomerative (Bottom-up) vs Divisive (Top-down).
| Clustering | K-Means | Hierarchical |
|---|---|---|
| Speed | Fast (O(n)). | Slow (O(n^2)). |
| Number of Clusters | Must be predefined. | Not required; use dendrogram. |
Interview Spotlight
Focus: The Elbow Method and identifying why K-Means fails on non-spherical clusters.
Netflix uses **Clustering** to group users based on their viewing habits into 'Genre Tribes'.
10. Principal Component Analysis (PCA)
Technical Definition**PCA** is a dimensionality reduction technique that transforms a large set of variables into a smaller one that still contains most of the original information. It finds new axes called **Principal Components** that maximize variance.
Core Concept- Variance: The goal is to retain components with high eigenvalues.
- Steps: Standardize → Covariance Matrix → Eigenvectors/values → Sort & Project.
- Orthogonality: All resulting components are independent (perpendicular).
Interview Spotlight
Focus: Ensuring people understand that PCA is UNSUPERVISED. It doesn't look at labels, only variance.
Reducing 100 census variables into 5 'Indices' for easier visualization is done via **PCA**.
11. Evaluation Metrics (Precision, Recall, F1, ROC-AUC)
Technical DefinitionQuantitative measures to assess model performance. **Precision** measures accuracy of positive predictions; **Recall** measures the ability to find all positive instances. **F1 Score** is their harmonic mean.
Core Concept- Precision: \(\frac{TP}{TP + FP}\). Minimized 'False Positives'.
- Recall: \(\frac{TP}{TP + FN}\). Minimized 'False Negatives'.
- ROC Curve: Plots TPR vs FPR at different thresholds. **AUC** summarizes the curve area.
Interview Spotlight
Focus: When to prioritize Recall over Precision? Answer: Medical diagnosis (Cancer detection) or Disaster prediction.
A spam filter needs high **Precision** (don't block important mail); a COVID-19 test needs high **Recall** (find all infected).
12. Confusion Matrix & Type I/II Errors
Technical DefinitionA **Confusion Matrix** is a table used to describe the performance of a classification model. **Type I Error** is a 'False Positive' (Rejected a true null), while **Type II Error** is a 'False Negative' (Accepted a false null).
Core Concept- Type I (False Positive): Alarm goes off when there's no fire.
- Type II (False Negative): No alarm during an actual fire.
- Accuracy: \(\frac{TP + TN}{\text{Total}}\). Beware: Inaccurate for imbalanced datasets!
Interview Spotlight
Focus: Why is 'Accuracy' misleading for fraud detection? Answer: If 99% of transactions are safe, a model that says 'Always Safe' is 99% accurate but useless.
In a court trial, convicting an innocent person is a **Type I Error**; letting a guilty person go is a **Type II Error**.
13. Gradient Descent (Batch, SGD, Mini-batch)
Technical DefinitionAn optimization algorithm used to minimize the cost function by iteratively moving in the direction of the steepest descent. The **Learning Rate (\(\eta\))** determines the step size.
Core Concept- Batch GD: Uses the whole dataset; slow but stable.
- SGD (Stochastic): Uses 1 sample per step; fast but noisy (escapes local minima).
- Mini-batch GD: Uses small chunks (e.g., 32-512); standard in DL.
Interview Spotlight
Focus: Explaining 'Vanishng Gradients' or finding the optimal learning rate. Too high = overshoot; too low = slow.
Training a massive model like GPT requires **Mini-batch GD** to fit processing requirements and maintain speed.
14. Ensemble Learning (Bagging vs. Boosting)
Technical DefinitionCombining multiple weak learners to create a strong predictive model. **Bagging** reduces variance by parallel training (Random Forest); **Boosting** reduces bias by sequential training (XGBoost, AdaBoost).
Core Concept- Bagging: Bootstrapped Aggregation. Independence between models.
- Boosting: Each new model focuses on correcting the errors of the previous ones.
- Stacking: Using a 'Meta-model' to combine predictions of different algorithms.
| Aspect | Bagging | Boosting |
|---|---|---|
| Training | Parallel. | Sequential. |
| Focus | Reduces Variance. | Reduces Bias. |
Interview Spotlight
Focus: Identifying when to use Gradient Boosting over Random Forest (Boosting is usually better for tabular competition winning).
Predicting house market trends often uses **XGBoost (Boosting)** for the highest possible precision in time-series data.
15. Naive Bayes (Bayes Theorem)
Technical DefinitionA probabilistic classifier based on **Bayes Theorem** with a 'Naive' assumption of total feature independence. Despite the simplistic assumption, it is highly effective for text data.
Core Concept- Bayes Theorem: \(P(A|B) = \frac{P(B|A)P(A)}{P(B)}\).
- Naive Assumption: Feature A has no effect on Feature B given the class.
- Types: Multinomial (Text), Gaussian (Continuous), Bernoulli (Binary).
Interview Spotlight
Focus: Explaining why the 'Naive' assumption usually holds up in text classification (Word frequency patterns).
Your first spam filter in the early 2000s likely used **Naive Bayes** to check for words like 'Prize' or 'Urgent'.
16. Feature Engineering & Selection
Technical Definition**Feature Engineering** is the art of creating new features to help the model learn (e.g., extracting 'Day of Week' from a timestamp). **Feature Selection** is pruning existing features to simplify the model and reduce noise.
Core Concept- Engineering: One-hot encoding, Scaling, Binning, Interaction features.
- Selection: Filter methods (Correlation), Wrapper methods (RFE), Embedded (Lasso).
Interview Spotlight
Focus: Explaining the 'Curse of Dimensionality' and why more features aren't always better.
In a loan model, adding a feature 'Income-to-Debt Ratio' created from raw columns is **Feature Engineering**.
17. Cross-Validation (K-Fold)
Technical DefinitionA resampling technique used to evaluate a model's performance on multiple subsets of data. **K-Fold CV** splits data into \(K\) parts, training on \(K-1\) and testing on 1, repeating this \(K\) times.
Core Concept- Benefit: Minimizes bias introduced by a single fixed Train/Test split.
- Standard K: Usually 5 or 10.
- Stratified K-Fold: Keeps class proportions identical in each fold (Used for imbalanced data).
Interview Spotlight
Focus: When is K-Fold better than Hold-out? Answer: When the dataset is small and every scrap of data is needed for both training and validation.
Academic researchers use **K-Fold** to prove their model is truly robust across different samplings of a medical trial.
18. Curse of Dimensionality
Technical DefinitionHigh-dimensional data (too many features) causes data points to become increasingly sparse, making distance-based algorithms like KNN or K-Means highly inaccurate and demanding massive amounts of training data.
Core Concept- Distance Problem: In high dimensions, the distance between any two points becomes nearly identical.
- Overfitting: Models find noise patterns that don't generalize.
- Solution: Dimensionality Reduction (PCA, Feature Selection).
Interview Spotlight
Focus: Explaining why Euclidean distance breaks in 10,000-dimensional space.
Searching for a specific book in a 1D shelf is easy; searching in a 1000-dimensional library where every book is equidistant is the **Curse**.
19. Hyperparameter Tuning (Grid vs. Random Search)
Technical DefinitionHyperparameters are settings defined before training (e.g., Learning Rate, K in KNN). **Grid Search** tries every combination in a predefined set, while **Random Search** samples combinations randomly for faster results.
Core Concept- Grid Search: Exhaustive and slow; guaranteed to find the 'best' in your provided grid.
- Random Search: Statistically faster; more efficient for large search spaces.
- Bayesian Optimization: Smarter tuning that learns from previous results to pick better parameters.
Interview Spotlight
Focus: Why is Random Search often preferred? Answer: Because many hyperparameters don't impact the result, and Random Search explores the 'important' ones better.
Before deploying an XGBoost model, engineers use **Random Search** to find the perfect 'Number of Trees' and 'Depth'.
20. Model Drift & Deployment (MLOps basics)
Technical Definition**Model Drift** is the degradation of model performance over time because real-world data patterns change. **MLOps** comprises the practices of automating the deployment, monitoring, and maintenance of ML models.
Core Concept- Concept Drift: The statistical properties of the target changes (e.g., 'Spam' content evolves).
- Data Drift: The input data distribution changes.
- Monitoring: Tracking drift using metrics like Population Stability Index (PSI).
Interview Spotlight
Focus: How to fix Concept Drift? Answer: Periodic retraining with the most recent labeled data.
A shopping model trained before COVID-19 suffered from massive **Model Drift** when buying habits shifted suddenly in 2020.
Deep Learning (DL)
1. Perceptron & Multi-Layer Perceptron (MLP)
Technical DefinitionA Perceptron is the simplest form of a neural network, consisting of a single layer of weights and a threshold activation function. A Multi-Layer Perceptron (MLP) is a feedforward artificial neural network consisting of at least three layers (input, hidden, and output) with non-linear activation functions.
Core Concept- Simplified Model: Computes a weighted sum of inputs and applies a step function.
- MLP: Uses Backpropagation and Gradient Descent to update weights across multiple hidden layers.
- Linear Separability: A single perceptron can only solve linearly separable problems (like AND/OR gates, not XOR).
Interview Spotlight
Focus: Explaining why a single perceptron cannot solve the XOR problem (Minsky and Papert, 1969).
A: To allow the network to learn complex, non-linear mappings by projecting data into higher-dimensional feature spaces.
A simple **Perceptron** can decide if you should wear a coat based on temp > 10°C; an **MLP** can predict the exact temperature based on complex climate variables.
2. Activation Functions (ReLU, Sigmoid, Tanh, Softmax)
Technical Definition**Activation Functions** are mathematical equations that determine the output of a neural network node. They introduce Non-linearity into the network, allowing it to learn complex patterns instead of acting as a simple linear transformation.
Core Concept- Sigmoid: Maps input to \([0, 1]\); prone to 'vanishing gradients'.
- Tanh: Maps input to \([-1, 1]\); zero-centered, often better than Sigmoid.
- ReLU (Rectified Linear Unit): Output is \(max(0, x)\); prevents vanishing gradients; standard for hidden layers.
- Softmax: Maps output to probabilities summing to 1; used in the final layer for multi-class classification.
| Function | Range | Best Use Case |
|---|---|---|
| Sigmoid | (0, 1) | Binary Classification output. |
| ReLU | [0, Inf) | Hidden Layers in deep networks. |
| Softmax | (0, 1) | Multi-class classification output. |
Interview Spotlight
Focus: The 'Dead ReLU' problem—where neurons output zero for all inputs if weights become too small. Solution: Leaky ReLU.
**ReLU** is used in almost all deep image recognition models to keep the training speed high and gradients healthy.
3. Forward & Backpropagation (Mathematical intuition)
Technical Definition**Forward Propagation** is the process of passing input through the network to generate a prediction. **Backpropagation** is the method used to calculate the gradient of the loss function with respect to each weight by applying the Chain Rule, essentially moving 'errors' backward to update the model.
Core Concept- Forward Pass: Calculation of outputs layer by layer.
- Error Calculation: Comparison of predicted output with ground truth using a Loss function.
- Backward Pass: Calculation of partial derivatives (gradients) using the Chain Rule: \(\frac{\partial L}{\partial w} = \frac{\partial L}{\partial a} \cdot \frac{\partial a}{\partial z} \cdot \frac{\partial z}{\partial w}\).
Interview Spotlight
Focus: Explaining how the Chain Rule is the engine of Deep Learning.
A: No. Backprop only calculates gradients. Optimization algorithms like SGD use those gradients to update weights.
**Backpropagation** is like a student checking the answer key and tracing back their mental mistakes to prepare for the next practice test.
4. Loss Functions (MSE, Cross-Entropy)
Technical Definition**Loss Functions** quantify how 'wrong' the model's predictions are compared to the actual targets. **MSE (Mean Squared Error)** is used for regression, while **Cross-Entropy** is the standard for classification, measuring the difference between two probability distributions.
Core Concept- MSE: Penalizes large errors heavily; \(\frac{1}{n}\sum(y_{true} - y_{pred})^2\).
- Cross-Entropy: Highly sensitive to small differences in probabilities near 0 or 1; \(- \sum y_{true} \log(y_{pred})\).
- Binary Cross-Entropy (Log Loss): Used for 0/1 classification tasks.
Interview Spotlight
Focus: Why use Cross-Entropy instead of Accuracy for training? Answer: Accuracy is not differentiable; you need a smooth gradient to optimize.
Predicting stock prices uses **MSE**; identifying if an image is a dog or cat uses **Cross-Entropy**.
5. Vanishing & Exploding Gradient Problems
Technical DefinitionInstabilities in deep neural networks during training. **Vanishing Gradients** occur when gradients become extremely small, causing early layers to stop learning. **Exploding Gradients** occur when gradients grow uncontrollably, leading to unstable weight updates and model 'diversion'.
Core Concept- Vanishing Cause: Repeated multiplication of small values (e.g., Sigmoid derivatives).
- Exploding Cause: Multiplication of large weight values in deep layers.
- Solutions: ReLU activation, Batch Normalization, Gradient Clipping, and Residual (Skip) Connections.
Interview Spotlight
Focus: Identifying how LSTM 'gates' specifically solve the vanishing gradient problem in sequence data.
Training a 100-layer network would be impossible without **Residual Connections (ResNet)** to bypass the vanishing gradient bottleneck.
6. CNN: Architecture (Convolution, Pooling, Padding)
Technical Definition**Convolutional Neural Networks (CNNs)** are specialized for spatial data like images. They use **Filters (Kernels)** to detect features like edges and textures via mathematical convolution, followed by **Pooling** to reduce dimensionality and ensure translation invariance.
Core Concept- Convolution: Element-wise multiplication of a filter over a small patch of input.
- Padding: Adding zero-pixels to the border to maintain image size after convolution.
- Pooling (Max/Average): Down-sampling to retain only the most prominent features (e.g., 2x2 Max Pool).
- Flattening: Converting 2D feature maps into a 1D vector for final classification.
Interview Spotlight
Focus: Why use CNNs instead of MLPs for images? Answer: Parameter efficiency. CNNs share weights across spatial regions (local receptive fields).
**CNNs** are the core technology behind Face ID on your iPhone and self-driving car vision systems.
7. RNN: Sequential data & Vanishing Gradients
Technical Definition**Recurrent Neural Networks (RNNs)** are designed for sequential data (time-series, text) where inputs are dependent on previous states. They maintain a 'hidden state' or memory that updates at each time step, allowing them to process sequences of variable length.
Core Concept- Hidden State: \(h_t = f(W \cdot h_{t-1} + U \cdot x_t)\).
- BPTT (Backpropagation Through Time): The training method for RNNs.
- Failure: Standard RNNs cannot remember long-range dependencies due to vanishing gradients across steps.
Interview Spotlight
Focus: Explaining why RNNs are 'unrolled' in time and how this leads to gradient instability.
Predicting the next word in a sentence (Auto-complete) used to be the primary job of **RNNs** before Transformers.
8. LSTM & GRU (Gating mechanisms)
Technical Definition**Long Short-Term Memory (LSTM)** and **Gated Recurrent Units (GRU)** are advanced RNN architectures designed to solve the long-term dependency problem. They use **Gating Mechanisms** to selectively forget or remember information over long sequences.
Core Concept- LSTM Gates: (1) Forget Gate, (2) Input Gate, (3) Output Gate. Maintains a separate 'Cell State'.
- GRU: Simplified version; combines forget and input gates into an 'Update Gate'.
- Benefit: Gradient flow is preserved through the cell state, allowing memory over 1000+ steps.
Interview Spotlight
Focus: Comparing LSTM and GRU. Answer: GRU is faster and slightly simpler; LSTM is more flexible for very complex sequences.
Google Translate originally migrated to **LSTMs** to handle long-distance grammatical agreements in translation.
9. Weight Initialization (Xavier, He)
Technical Definition**Weight Initialization** is the strategy for choosing starting values for neural network weights. Proper initialization ensures that the variance of activations and gradients stays stable across layers, preventing signals from disappearing or exploding instantly.
Core Concept- Xavier (Glorot) Init: Best for Sigmoid/Tanh. Balances variance based on
fan_inandfan_out. - He Initialization: Best for **ReLU**. Uses a factor of \(\sqrt{2/\text{fan\_in}}\) to account for ReLU's half-zero output.
- Zero Init: A fatal mistake! Causes all neurons in a layer to perform the same calculation (Symmetry).
Interview Spotlight
Focus: Why not initialize all weights to zero? Answer: To maintain **Symmetry Breaking** so each neuron learns a unique feature.
Using **He Initialization** is a day-one best practice for any modern Deep Learning engineer building multi-layer CNNs.
10. Batch Normalization & Dropout
Technical DefinitionTechniques to improve training stability and prevent overfitting. **Batch Normalization** re-scales the outputs of a layer to have zero mean and unit variance. **Dropout** randomly 'turns off' neurons during training, forcing the network to learn redundant representations.
Core Concept- Batch Norm: Reduces Internal Covariate Shift; allows higher learning rates.
- Dropout: Prevents 'Co-adaptation' between neurons; only used during Training, not Inference.
- Effect: Batch Norm accelerates training; Dropout generalizes the model better.
Interview Spotlight
Focus: Difference in Dropout behavior between Training and Testing. Answer: In testing, all neurons are active but outputs are scaled.
**Dropout** is like training a basketball team where someone is randomly pulled off the court—the remaining players must learn to play all positions to win.
11. Optimizers (Adam, RMSProp, Adagrad)
Technical Definition**Optimizers** are algorithms that update weights to minimize loss. **Adam (Adaptive Moment Estimation)** is the current gold standard, combining the benefits of Momentum (speed through flat areas) and RMSProp (adapting learning rates for each parameter).
Core Concept- Adagrad: Lowers learning rate for frequent parameters; good for sparse data but decays too fast.
- RMSProp: Fixes Adagrad's decay by using a moving average of squared gradients.
- Adam: Most robust; uses both first and second moments of gradients to adjust steps.
Interview Spotlight
Focus: Why is Adam favored? Answer: It requires little hyperparameter tuning and handles noise/sparsity exceptionally well.
If you're unsure which optimizer to use, **Adam** is almost always the correct first choice in 95% of deep learning projects.
12. Transfer Learning
Technical Definition**Transfer Learning** is a technique where a model developed for one task is reused as the starting point for a model on a second related task. It is highly effective when training data for the second task is limited.
Core Concept- Logic: Early layers of a CNN learn general features (edges, shapes) which are useful for almost all vision tasks.
- Fine-tuning: Keeping early weights 'frozen' and training only the final classification layers.
- Data Efficiency: Allows training high-quality models with as few as 100 images.
Interview Spotlight
Focus: Identifying when to 'unfreeze' more layers. Answer: If the new dataset is huge and very different from the original source.
Taking a model pre-trained on ImageNet (natural images) and using it to identify X-ray fractures is a classic **Transfer Learning** use case.
13. Autoencoders & GANs
Technical DefinitionUnsupervised neural architectures. **Autoencoders** learn to compress input into a lower-dimensional latent space and then reconstruct it. **GANs (Generative Adversarial Networks)** use a 'Generator' and a 'Discriminator' in a zero-sum game to create hyper-realistic synthetic data.
Core Concept- Autoencoder: Encoder (Compression) + Decoder (Reconstruction). Used for Denoising/Anomaly Detection.
- GAN Generator: Tries to create 'fake' data that fools the Discriminator.
- GAN Discriminator: Tries to distinguish between real data and data from the Generator.
Interview Spotlight
Focus: Explaining 'Mode Collapse' in GANs—where the generator only learns to produce a single type of result instead of diversity.
**GANs** are responsible for creating Deepfakes, while **Autoencoders** are used by astronomers to find glitches in telescope data.
14. Attention Mechanism & Transformers
Technical Definition**Attention** is a mechanism that allows a model to weigh the importance of different parts of input data relative to a specific context. **Transformers** use **Self-Attention** to process entire sequences in parallel, replacing sequential RNNs entirely.
Core Concept- Self-Attention: Calculates Query (What I want), Key (What I have), and Value (What I give).
- Multi-Head Attention: Allows the model to focus on various aspects of context simultaneously.
- Positional Encoding: Adds info about word order since Transformers process all words at once.
Interview Spotlight
Focus: Why are Transformers faster to train than RNNs? Answer: Parallelization. RNNs require step-by-step processing; Transformers do not.
The 'Attention' mechanism is the breakthrough that allowed AI like **ChatGPT** to understand complex, long-distance relationships in text.
15. BERT & GPT (High-level architecture)
Technical DefinitionModern Large Language Models (LLMs) based on Transformers. **BERT** is Encoder-only and 'bidirectional' (looks at both sides of a word). **GPT** is Decoder-only and 'unidirectional' (predicts the next word from left-to-right).
Core Concept- BERT: Pre-trained using Masked Language Modeling (MLM). Excellent for Understanding tasks (Search, Q&A).
- GPT: Pre-trained using Causal Language Modeling. Excellent for Generation tasks.
- Paradigm: Both rely on 'Pre-train → Fine-tune' workflows.
| Model | Direction | Primary Strength |
|---|---|---|
| BERT | Bi-directional. | Classification & Information Retrieval. |
| GPT | Uni-directional. | Content Generation & Creative Writing. |
Interview Spotlight
Focus: Explaining why BERT is better for sentiment analysis than GPT. Answer: It has full context of the words surrounding the adjectives.
Google Search uses **BERT** to understand your query intent, while **GPT** powers assistants that write code or essays for you.
16. Tensors & Frameworks (PyTorch vs. TensorFlow)
Technical Definition**Tensors** are n-dimensional arrays (multi-dimensional matrices) which are the fundamental data structures in deep learning. **PyTorch** and **TensorFlow** are the primary frameworks used to build and train models on these tensors.
Core Concept- Tensor: Scalar (0D), Vector (1D), Matrix (2D), Tensor (3D+).
- PyTorch: Dynamic computational graphs; very popular in Research (Pythonic).
- TensorFlow: High scalability; robust for production deployment (Keras integration).
| Feature | PyTorch | TensorFlow |
|---|---|---|
| Graph Type | Dynamic (imperative). | Static (traditionally). |
| Debugging | Easy/Pythonic. | Historically difficult (improving). |
Interview Spotlight
Focus: Defining what a Tensor is in the context of DL. Answer: A container for data that carries a gradient history.
Most AI papers today use **PyTorch** for its flexibility, while large tech companies often use **TensorFlow** for massive data pipelines.
17. Object Detection (YOLO, R-CNN)
Technical Definition**Object Detection** involves identifying 'What' is in an image and 'Where' it is located (Bounding Boxes). **R-CNN** uses region proposals (slow but accurate); **YOLO (You Only Look Once)** treats detection as a single regression problem (fastest).
Core Concept- Bounding Box: Represented as \((x, y, w, h)\).
- IoU (Intersection over Union): Metric to evaluate box overlap accuracy.
- Non-Max Suppression (NMS): Removes redundant duplicate boxes for the same object.
Interview Spotlight
Focus: Trade-off between inference speed and accuracy. Answer: YOLO is for real-time; Faster R-CNN is for high-precision medical/scientific images.
Self-driving cars use **YOLOv8** to detect pedestrians and street signs in milliseconds while driving.
18. NLP: Tokenization, Word Embeddings (Word2Vec)
Technical Definition**Tokenization** is splitting text into units (words/sub-words). **Word Embeddings** map these tokens to continuous vectors where similar-meaning words are spatially close in high-dimensional space.
Core Concept- Word2Vec: Uses Skip-gram or CBOW to learn context.
- Embedding Properties: Captures semantics (e.g.,
King - Man + Woman = Queen). - Sub-word Tokenization: Used by models like BERT (WordPiece) to handle 'Out-of-vocabulary' words.
Interview Spotlight
Focus: Explaining why we use vectors instead of simple integer IDs for words. Answer: Integers imply a false numerical relationship (e.g., Cat=1, Dog=2); vectors define semantic similarity.
**Word Embeddings** are why your phone's keyboard knows that if you type 'Happy', it should suggest 'Birthday' next.
19. Fine-tuning vs. RAG (Retrieval Augmented Generation)
Technical DefinitionTechniques to optimize LLMs for specific knowledge. **Fine-tuning** updates the actual model weights on new data. **RAG** provides a search-engine context to the prompt, allowing the model to 'read' the correct info before answering without weight updates.
Core Concept- Fine-tuning: For style and domain adaptation. Expensive and static.
- RAG: For factual accuracy and proprietary data. Dynamic and verifiable (provides citations).
- Vector Database: Stores the external knowledge used by RAG.
| Method | Knowledge Type | Cost |
|---|---|---|
| Fine-tuning | Internalized skills/style. | High (GPU intensive). |
| RAG | External facts/documents. | Low (Search intensive). |
Interview Spotlight
Focus: When is RAG better than fine-tuning? Answer: When the data changes frequently (e.g., Daily news or real-time inventory).
A legal AI uses **RAG** to look up specific case law in a 5000-page PDF before summarizing it for a lawyer.
20. Hardware Acceleration (GPU/TPU for DL)
Technical Definition**Hardware Accelerators** are specialized chips designed for the high-volume matrix multiplication involved in deep learning. **GPUs** are versatile parallel processors; **TPUs (Tensor Processing Units)** are ASICs custom-built by Google for tensor math.
Core Concept- Parallelism: A CPU has few powerful cores; a GPU has thousands of simple cores for simultaneous math.
- VRAM: Memory on the GPU where tensors are stored during training.
- CUDA (NVIDIA): The primary software layer that connects code to GPU hardware.
Interview Spotlight
Focus: Why are GPUs faster for Deep Learning? Answer: Matrix operations can be done in parallel; CPUs compute sequentially.
Training a modern LLM without **H100 GPUs** or **TPUs** would take decades instead of weeks.
Artificial Intelligence (AI)
1. Weak AI vs. Strong AI vs. GenAI
Technical Definition**Weak AI (Narrow AI)** is designed to perform a single specific task (e.g., Siri). **Strong AI (AGI)** is a hypothetical machine with consciousness and human-level intelligence across all domains. **GenAI** is AI capable of creating original content (text, images, audio).
Core Concept- Narrow AI: Rule-based or pattern-matching; no general understanding.
- AGI: Self-aware; can learn any mental task a human can.
- Generative: Based on Probabilistic models (LLMs, Diffusion) to synthesize new data.
Interview Spotlight
Focus: Defining where current models like GPT-4 sit. Answer: They are advanced Narrow AI/Generative AI, not yet fully AGI.
**Weak AI** plays chess; **GenAI** writes a poem about why it lost at chess.
2. AI Agents & Environment Types
Technical DefinitionAn **AI Agent** is an autonomous entity that perceives its environment through sensors and acts upon it through actuators to achieve goals. Environments are classified as **Deterministic vs. Stochastic**, **Static vs. Dynamic**, and **Observable vs. Partially Observable**.
Core Concept- Simple Reflex: Acts based on current percept (If-Then rules).
- Goal-Based: Considers future consequences of actions.
- Utility-Based: Acts to maximize a performance measure (Happiness/Profit).
Interview Spotlight
Focus: Defining a 'Partially Observable' environment. Answer: Where the agent doesn't see the full state (e.g., Poker or Self-driving in fog).
A **Vacuum Robot** is a reflex agent; a **Trading Bot** is a utility-based agent.
3. Uninformed Search (BFS, DFS, Uniform Cost)
Technical Definition**Uninformed Search (Blind Search)** algorithms have no information about the distance to the goal. **BFS (Breadth-First)** explores layer by layer; **DFS (Depth-First)** dives deep into one branch before backtracking.
Core Concept- BFS: Guaranteed optimal for unit-cost; high memory (Space \(O(b^d)\)).
- DFS: Not optimal; memory-efficient (Space \(O(bd)\)).
- Uniform Cost: Expands lowest cumulative cost node; equivalent to Dijkstra's.
| Algorithm | Completeness | Optimality |
|---|---|---|
| BFS | Complete. | Optimal (unit cost). |
| DFS | Complete (finite). | Not Optimal. |
Interview Spotlight
Focus: Why choose DFS? Answer: When memory is tight and paths are very deep but finite.
Mapping followers of followers on Instagram uses **BFS** to find 'Degrees of Separation'.
4. Informed Search (A*, Greedy Best First)
Technical Definition**Informed Search** algorithms use a **Heuristic (\(h\))** to estimate the cost to reach the goal. **A* Search** is the most widely used, combining actual cost from start (\(g\)) and heuristic estimate to goal (\(h\)).
Core Concept- A* Formula: \(f(n) = g(n) + h(n)\).
- Admissibility: A heuristic must never overestimate the cost for A* to be optimal.
- Greedy BFS: Only considers \(h(n)\); fast but can get stuck in loops and isn't optimal.
Interview Spotlight
Focus: Defining an 'Admissible Heuristic' (e.g., Straight-line distance is always admissible for road paths).
Google Maps uses **A*** variants to find the fastest route to your office while avoiding traffic.
5. Adversarial Search (Minimax, Alpha-Beta Pruning)
Technical DefinitionAlgorithms used in competitive multi-agent games. **Minimax** assumes the opponent will play optimally to minimize your score. **Alpha-Beta Pruning** optimizes Minimax by skipping branches that cannot possibly affect the final decision.
Core Concept- Max Node: Player trying to maximize the score.
- Min Node: Opponent trying to minimize the score.
- Pruning: Cuts search time in half without changing the final result.
Interview Spotlight
Focus: Manual tracing of a game tree to identify where pruning occurs (Alpha vs Beta thresholds).
Chess AI like Deep Blue uses **Minimax with Alpha-Beta** to look 20 moves ahead in seconds.
6. Knowledge Representation (FOPL, Semantic Nets)
Technical DefinitionThe way AI stores information about the world. **FOPL (First Order Predicate Logic)** uses symbols and quantifiers to represent facts (\(\forall, \exists\)). **Semantic Networks** represent knowledge as a graph of nodes (concepts) and edges (relationships).
Core Concept- FOPL: Precise and logical; powerful for inference engines.
- Semantic Nets: Intuitive; captures 'ISA' (is-a) and 'HAS' (has-a) relationships.
- Frames: Advanced structure where objects have slots/attributes.
Interview Spotlight
Focus: Translating English sentences (e.g., 'Every student likes AI') into FOPL notation (\(\forall x [\text{Student}(x) \implies \text{Likes}(x, AI)]\)).
**Semantic Nets** are the ancestor technology of modern Knowledge Graphs used by Google to show 'Related People'.
7. Forward vs. Backward Chaining
Technical DefinitionInference methods in expert systems. **Forward Chaining** starts with known facts and uses rules to derive new conclusions (Data-driven). **Backward Chaining** starts with a goal/hypothesis and works back to check if facts support it (Goal-driven).
Core Concept- Forward: If facts A and B are true, find what rules can fire. Good for diagnosis.
- Backward: If we want to reach goal Z, find what premises (A, B) must be true. Good for planning.
| Feature | Forward Chaining | Backward Chaining |
|---|---|---|
| Logic | Data-driven. | Goal-driven. |
| Search | Breadth-first. | Depth-first. |
Interview Spotlight
Focus: Identifying which method a specific AI system uses based on available data.
An automation system checking 'Is the door locked?' (Goal) uses **Backward Chaining** to verify sensor inputs.
8. Expert Systems
Technical DefinitionAI systems that mimic the decision-making ability of a human expert in a specific domain. They consist of a **Knowledge Base** (facts/rules) and an **Inference Engine** (reasoning logic).
Core Concept- MYCIN: Historical medical expert system for blood infections.
- Logic: Usually relies on If-Then rules.
- Limitation: Can't learn from experience; difficult to update 'Common Sense' knowledge.
Interview Spotlight
Focus: Explaining why Expert Systems were the dominant AI paradigm in the 1980s before Machine Learning took over.
A tax 소프트웨어 that asks you 20 questions to determine your refund is a modern **Expert System**.
9. Turing Test & Chinese Room Argument
Technical DefinitionPhilosophical foundations of intelligence. The **Turing Test** assesses if a machine can behave identically to a human. The **Chinese Room Argument (Searle)** posits that internal symbol manipulation doesn't equal genuine understanding.
Core Concept- Turing: Operational definition—if it acts smart, it is smart.
- Chinese Room: A room can correctly translate Chinese via rules without the person inside knowing Chinese (Syntax \(\neq\) Semantics).
Interview Spotlight
Focus: Does an LLM understand code, or is it just a 'Stochastic Parrot'? This relates directly to the Chinese Room.
Passing a 'Captcha' is a miniature, inverse **Turing Test** used to prove you are NOT an AI.
10. Fuzzy Logic
Technical DefinitionA form of logic that deals with approximate reasoning rather than fixed truth. It uses **Degrees of Truth** (values between 0 and 1) instead of absolute binary (True/False).
Core Concept- Linguistic Variables: Terms like 'Warm', 'Slightly Cold', 'Very Hot'.
- Fuzzification: Converting crisp inputs into fuzzy values.
- Defuzzification: Converting fuzzy results back into actionable crisp outputs.
Interview Spotlight
Focus: Why use Fuzzy Logic for hardware control? Answer: Because physical sensors produce continuous, noisy values better handled with ranges.
Modern **Washing Machines** use Fuzzy Logic to determine exact water levels based on 'Small, Medium, or Large' load estimates.
11. Genetic Algorithms
Technical DefinitionMetaheuristic search algorithms inspired by the process of natural selection. They evolve a population of solutions using **Selection, Crossover (Recombination), and Mutation** to find optimal results.
Core Concept- Fitness Function: Measures how 'good' a solution is.
- Survival of Fittest: Best solutions reproduce to form the next generation.
- Mutation: Prevents premature convergence by introducing random diversity.
Interview Spotlight
Focus: Explaining why GAs are used for 'Global Optimization' where gradient descent might get stuck in local minima.
Designing the most aerodynamic shape for an airplane wing often uses **Genetic Algorithms** to evolve a design over millions of simulated iterations.
12. Natural Language Processing (NLP) pipeline
Technical DefinitionThe series of steps used to transform raw text into a machine-readable format. It involves cleaning data and extracting meaningful semantic features.
Core Concept- Lowercasing & Stopword Removal: Removing 'the', 'is', etc.
- Stemming/Lemmatization: Reducing words to roots (e.g., 'Running' -> 'Run').
- Named Entity Recognition (NER): Identifying names, dates, and places.
- POS Tagging: Identifying Nouns, Verbs, and Adjectives.
Interview Spotlight
Focus: Difference between Stemming (Rule-based chopping) and Lemmatization (Dictionary-based root finding).
Email services use **NLP Pipelines** to summarize your receipts into clear tabular entries in your calendar.
13. Constraint Satisfaction Problems (CSP)
Technical DefinitionProblems defined by a set of variables, their domains, and a set of constraints that the solution must satisfy. **Backtracking** is the standard algorithmic approach used to solve CSPs.
Core Concept- Components: Variables, Domains (Possible values), Constraints (Rules).
- Constraint Propagation: Pruning the domain of variables based on other assignments (e.g., ARC Consistency).
- Optimal Solving: Combining Backtracking with heuristics like 'Minimum Remaining Values'.
Interview Spotlight
Focus: Identifying map-coloring or Sudoku as CSPs and explaining how 'Forward Checking' prevents dead-end paths.
University **Course Scheduling** is a massive CSP that ensures no two classes share the same room and professor at once.
14. Markov Decision Processes (MDP)
Technical DefinitionA mathematical framework for modeling decision-making where outcomes are partly random and partly under the control of a decision-maker. It is the foundation of **Reinforcement Learning**.
Core Concept- Tuple \((S, A, P, R, \gamma)\): State, Action, Probability, Reward, Discount Factor.
- Markov Property: The future is independent of the past, given the present.
- Policy (\(\pi\)): A mapping from states to actions that maximizes total reward.
Interview Spotlight
Focus: Defining the 'Bellman Equation', which recursively calculates the value of the current state.
A robot navigating a grid with slippery floors (where move action might fail) is modeled as an **MDP**.
15. Heuristic Functions
Technical DefinitionAn estimation function used to guide search algorithms toward the goal node. It represents a 'rule of thumb' or 'common sense' shortcut that helps avoid exploring useless paths.
Core Concept- Admissible: Never overestimates (required for A* optimality).
- Consistent: Follows the triangle inequality (satisfies monotonicity).
- Examples: Manhattan Distance (Grid), Euclidean (Direct line), Hamming Distance (Puzzles).
Interview Spotlight
Focus: Designing a heuristic for the '8-Puzzle' problem. Answer: Number of misplaced tiles.
A delivery driver uses a **Heuristic** of 'Nearest Unvisited Customer' to plan their afternoon route.
16. Hill Climbing & Simulated Annealing
Technical DefinitionLocal search optimization algorithms. **Hill Climbing** iteratively moves toward better neighbors but often gets stuck in Local Maxima. **Simulated Annealing** allows occasional 'bad' moves to escape local peaks, inspired by physics-based cooling.
Core Concept- Hill Climbing: Greedy; fast; risk of 'Plateaus' and 'Ridges'.
- Simulated Annealing: High temperature = more exploration; Low temperature = strictly better moves.
- Convergence: Slow to find the global peak but much more reliable than simple Hill Climbing.
Interview Spotlight
Focus: Explaining why we allow 'worse' steps in Simulated Annealing. Answer: To escape local optima and find the true Global Maximum.
Optimizing the layout of components on a circuit board uses **Simulated Annealing** to minimize wire length.
17. Robotics in AI
Technical DefinitionThe intersection of AI and physical hardware. It involves the integration of **Perception** (Computer Vision, SLAM), **Planning** (A*, Pathfinding), and **Control** (PID, RL) to allow machines to interact with the real world.
Core Concept- SLAM (Simultaneous Localization and Mapping): Building a map while moving through it safely.
- Kinematics: Mathematical modeling of joint movements (Forward vs Inverse).
- Sensors: LiDAR, Ultrasound, Cameras (providing 'Percepts').
Interview Spotlight
Focus: Explaining the 'Sim-to-Real' gap in Reinforcement Learning for robots.
Warehouse robots like Amazon's **Kiva** use AI to coordinate paths for 1000s of units without colliding.
18. AI Ethics, Bias, and Safety
Technical DefinitionThe field focused on ensuring AI systems are developed responsibly. It addresses **Algorithmic Bias** (unfair treatments), **Safety** (preventing harmful actions), and **Transparency** (Explainable AI - XAI).
Core Concept- Dataset Bias: If training data is flawed (e.g., historical sexism), the AI will replicate it.
- Alignment Problem: Ensuring AI goals match human values.
- Adversarial Attacks: Tricking AI with subtle, invisible perturbations to input data.
Interview Spotlight
Focus: Defining 'Explainable AI'. Why is it critical for healthcare AI? Answer: Doctors need to know *why* a model diagnosed cancer to trust it.
Social media algorithms constantly undergo **Ethics Audits** to prevent the accidental promotion of harmful misinformation.
19. Agentic AI (Multi-agent orchestration)
Technical DefinitionAn advanced framework where multiple specialized AI agents collaborate to solve a complex task. Unlike a single model, **Agentic AI** involves reasoning, tool-use, and task-delegation between an 'Orchestrator' and 'Workers'.
Core Concept- Reasoning: Chains of thought (CoT) where agents plan before acting.
- Tool-Use: Agents calling APIs (Calculators, Search, Document Readers) to get facts.
- Multi-Agent: E.g., One agent researches, one writes, and one fact-checks the first two.
Interview Spotlight
Focus: Defining the difference between a simple Chatbot and an **Autonomous Agent**. Answer: Autonomy and step-by-step goal execution.
A system that reads your email, checks your calendar, books a flight, and confirms your hotel autonomously is **Agentic AI**.
20. Prompt Engineering as an AI tool
Technical DefinitionThe practice of optimizing input text (prompts) to guide Generative AI models toward more accurate and useful outputs. It involves techniques like **Few-shot Prompting**, **Chain-of-Thought**, and **System Persona** definition.
Core Concept- Few-shot: Providing 2-3 examples within the prompt.
- Zero-shot: Asking a question directly without examples.
- COT (Chain of Thought): Asking the model to "Think step-by-step" to improve logical reasoning.
- Technique: "Act as a [Persona]" significantly changes model tone and constraints.
Interview Spotlight
Focus: Explaining why 'Prompt Engineering' is important for steering model behavior and reducing hallucinations.
A developer using **Chain-of-Thought** prompting to fix a bug gets a detailed explanation instead of just a code snippet.