Parallel and Distributed Computing
Help Questions
AP Computer Science Principles › Parallel and Distributed Computing
Which of the following is a primary difference between parallel and distributed computing?
Parallel computing uses multiple processors, while distributed computing uses only a single processor.
Parallel computing solutions run faster than distributed computing solutions, regardless of the problem.
Parallel computing is used for mathematical problems, while distributed computing is used for text processing.
In parallel computing, processors often share memory, while in distributed computing, machines have their own private memory and communicate over a network.
Explanation
A key architectural difference is that parallel computing often involves multiple cores or processors within a single machine sharing a common memory space. In distributed computing, the computation is spread across separate machines, each with its own memory, which communicate by sending messages over a network. (A) is incorrect, as both use multiple processing units. (B) is incorrect as both can be used for various problem types. (D) is an overgeneralization and not always true.
What is the primary benefit of using distributed computing for solving extremely large-scale scientific problems, such as global climate modeling?
It ensures the security of the data by encrypting each piece of information before processing it on a separate device.
It guarantees a perfectly optimal solution by exploring every possibility in a sequential order.
It reduces the complexity of the program's code by automatically converting it to a simpler language.
It allows problems to be solved that would be impractical for a single computer due to processing time or storage needs.
Explanation
Distributed computing's main advantage is its ability to harness the collective power and storage of many computers. This allows it to tackle problems that are too large or time-consuming for even the most powerful single computer. (A) is incorrect; distributed computing does not guarantee optimality and is not sequential. (C) is incorrect; it does not simplify the code itself. (D) describes a security measure, not the primary computational benefit.
If this program is run on a parallel system with 5 processors, what is the minimum total execution time?
30 minutes
25 minutes
110 minutes
22 minutes
Explanation
The program has two sequential parts (setup and cleanup) and one parallel part (10 processing tasks). First, the setup runs sequentially for 5 minutes. Next, the 10 processing tasks run in parallel on 5 processors. Each processor will handle 10/5 = 2 tasks. Since each task takes 10 minutes, each processor will take 2 * 10 = 20 minutes. The parallel part takes 20 minutes. Finally, the cleanup runs sequentially for 5 minutes. The total time is 5 (setup) + 20 (parallel) + 5 (cleanup) = 30 minutes. (C) is the sequential time. (B) and (D) are incorrect calculations.
A programmer is trying to improve the performance of a data analysis program by using parallel computing. After rewriting the code to use 8 processors instead of 1, they notice the program is only 3 times faster, not 8 times faster. Which of the following is the most likely explanation for this?
The program contains a sequential portion that cannot be parallelized, limiting the overall speedup.
Distributed computing would have been a better model for this problem than parallel computing.
The processors used in the parallel solution are slower than the single processor used in the sequential solution.
The program requires more data storage when run in parallel, which slows down the execution time.
Explanation
The efficiency of a parallel solution is limited by the portion of the program that must be executed sequentially. This sequential part creates a bottleneck, preventing the speedup from being directly proportional to the number of processors. (B) is a possible but less fundamental reason; the question implies identical processing power. (C) is not necessarily true and doesn't explain the specific observation. (D) is not a primary reason for the limitation of speedup.
Refer to the text: Genome sequencing pipelines split read alignment into many independent chunks, then merge partial matches; processors must synchronize to avoid duplicate counting. How does parallel computing improve processing speed?
By ensuring node failures never occur, so no time is lost to recovery procedures.
By moving data to distant nodes over the internet, which always accelerates computation.
By dividing alignment work into chunks processed simultaneously, then merging results efficiently.
By replacing algorithmic steps with manual verification to increase accuracy and speed.
Explanation
This question tests understanding of parallel and distributed computing concepts, specifically how parallel computing achieves speed improvements in genome sequencing pipelines. Parallel computing divides a single task into smaller subtasks that can be processed simultaneously by multiple processors, then combines the results to complete the original task faster. In this passage, parallel computing is illustrated through genome sequencing pipelines that split read alignment into many independent chunks processed simultaneously, with processors synchronizing to avoid duplicate counting when merging results. Choice A is correct because it accurately describes how parallel computing improves processing speed by dividing alignment work into chunks processed simultaneously, then merging results efficiently - this is the fundamental speedup mechanism of parallel computing. Choice B is incorrect because it describes moving data to distant nodes over the internet, which relates to distributed computing and actually introduces network latency rather than improving speed. To help students: Use the analogy of multiple workers assembling parts of a product simultaneously versus one worker doing everything sequentially. Demonstrate speedup calculations showing how parallel processing reduces time. Watch for: confusion between parallel speedup (simultaneous processing) and distributed computing characteristics.
Refer to the text: In genome sequencing, distributed computing stores reads across networked nodes and uses message passing; if one node fails, other nodes continue and data can be re-copied. How does distributed computing enhance fault tolerance?
By guaranteeing perfect accuracy in every alignment, eliminating the impact of hardware faults.
By requiring constant shared-memory access, preventing any single component from failing.
By keeping computation on one machine so failures cannot spread across a network.
By replicating data and rerouting tasks so other nodes continue when one node fails.
Explanation
This question tests understanding of parallel and distributed computing concepts, specifically how distributed computing provides fault tolerance in genome sequencing applications. Distributed computing involves multiple independent systems connected via network that can continue operating even when individual nodes fail, unlike parallel computing which typically operates within a single system. In this passage, distributed computing is illustrated through genome sequencing that stores reads across networked nodes using message passing, with the ability to re-copy data and continue when nodes fail. Choice C is correct because it accurately describes how distributed computing enhances fault tolerance by replicating data and rerouting tasks so other nodes can continue when one node fails, which is the fundamental mechanism of fault tolerance in distributed systems. Choice B is incorrect because it describes shared-memory access, which is characteristic of parallel computing within a single system, not distributed computing across networked nodes. To help students: Emphasize that fault tolerance in distributed systems comes from redundancy and independence of nodes. Use real-world examples like cloud storage services that continue working even when servers fail. Watch for: confusion between parallel computing's shared memory and distributed computing's message passing architectures.
Refer to the text: Parallel genome alignment requires frequent synchronization to combine partial results; distributed computing communicates through messages and tolerates higher communication delays. What is a key difference between parallel and distributed computing?
Parallel applies only to storage; distributed applies only to computation.
Parallel is inherently fault tolerant; distributed cannot recover from node failures.
Parallel typically uses lower-latency internal communication; distributed typically uses higher-latency network messaging.
Parallel forbids task splitting; distributed requires every task to be identical.
Explanation
This question tests understanding of parallel and distributed computing concepts, specifically the communication characteristics that distinguish these two computing paradigms. Parallel computing typically uses fast, low-latency communication within a single system, while distributed computing must handle higher-latency network communication between separate nodes. In this passage, the distinction is illustrated through parallel genome alignment requiring frequent synchronization to combine partial results (implying fast, local communication) versus distributed computing communicating through messages and tolerating higher communication delays. Choice A is correct because it accurately identifies that parallel computing typically uses lower-latency internal communication while distributed computing typically uses higher-latency network messaging, which is a fundamental architectural difference. Choice C is incorrect because it reverses the fault tolerance characteristics - distributed computing is inherently more fault tolerant due to node independence, while parallel computing within one machine is more vulnerable to system-wide failures. To help students: Demonstrate communication speed differences using examples like CPU cache access (nanoseconds) versus network packets (milliseconds). Explain how these differences affect algorithm design. Watch for: students confusing which system has better fault tolerance or misunderstanding the impact of communication latency.
Refer to the text: genome sequencing pipelines may distribute data across nodes to handle massive read volumes. Which scenario best exemplifies distributed computing?
Multiple lab servers share read alignment tasks and exchange results over a network.
A spreadsheet sorts a small list using one thread to avoid coordination.
One laptop increases speed by overclocking its single processor.
A single GPU renders one image faster by using many cores on one card.
Explanation
This question tests understanding of distributed computing through example identification, specifically recognizing scenarios with multiple networked systems. Distributed computing involves multiple independent computer systems working together over a network to solve a problem or provide a service. In this passage, genome sequencing pipelines distribute data across nodes to handle massive volumes. Choice B is correct because it clearly exemplifies distributed computing—multiple lab servers (separate systems) sharing tasks and exchanging results over a network. Choice A is incorrect because a single GPU with many cores represents parallel computing within one device, not distributed computing across multiple systems. To help students: Emphasize the key identifier of distributed computing—multiple separate computer systems connected by a network. Practice categorizing scenarios based on whether they involve one system or multiple networked systems. Watch for: confusion between many processors in one system (parallel) versus many systems working together (distributed).
Refer to the text: In parallel genome alignment, processors must coordinate when merging partial matches; excessive coordination can reduce speed gains. Which statement best captures a limitation implied by the passage?
Fault tolerance is irrelevant in genomics because hardware failures never occur in practice.
Parallel computing cannot run genome sequencing because it never allows task division.
Communication and synchronization overhead can constrain parallel speedup as processors increase.
Distributed computing eliminates all communication delays by using shared memory across nodes.
Explanation
This question tests understanding of parallel and distributed computing concepts, specifically the limitations of parallel computing due to coordination overhead. Parallel computing's speedup is limited by the need for processors to communicate and synchronize, which becomes more significant as the number of processors increases, following Amdahl's Law. In this passage, this limitation is illustrated through parallel genome alignment where processors must coordinate when merging partial matches, and excessive coordination can reduce speed gains. Choice B is correct because it accurately captures that communication and synchronization overhead can constrain parallel speedup as processors increase, which is a fundamental limitation of parallel computing. Choice A is incorrect because it claims parallel computing cannot run genome sequencing or allow task division, which contradicts the passage that explicitly describes parallel genome sequencing through task division. To help students: Introduce Amdahl's Law mathematically and show how even small sequential portions limit speedup. Use examples where adding more processors provides diminishing returns. Watch for: students thinking parallel computing has no limitations or misunderstanding that coordination overhead increases with processor count.
Refer to the text: genome sequencing uses parallel computing for rapid per-read analysis and distributed computing for cluster-wide throughput. What is a key difference between parallel and distributed computing?
Parallel computing depends on geographic separation, while distributed computing requires one shared cache.
Parallel computing cannot be used in science, while distributed computing is only for science.
Parallel computing coordinates processors within a single system, while distributed computing coordinates multiple nodes over a network.
Parallel computing is inherently fault tolerant, while distributed computing fails whenever one node fails.
Explanation
This question tests understanding of the fundamental distinction between parallel and distributed computing architectures. Parallel computing coordinates multiple processors within a single computer system, typically sharing memory or connected by fast internal links, while distributed computing coordinates multiple independent computer systems (nodes) connected over a network. In this passage, genome sequencing uses parallel computing for rapid per-read analysis on one system and distributed computing for cluster-wide throughput across multiple systems. Choice A is correct because it accurately states this key architectural difference—parallel computing works within a single system while distributed computing spans multiple networked nodes. Choice C is incorrect because it reverses the fault tolerance characteristics—distributed computing is generally more fault tolerant than parallel computing due to node independence. To help students: Always start with the physical architecture distinction—one system versus multiple systems. Create clear visual representations showing the boundary of a single system versus multiple networked systems. Watch for: misconceptions about which architecture provides better fault tolerance or assumptions about application domains.