导航菜单

External Sorting

Idea

External sorting handles data too large for memory, using external storage (disk). Split the big file into memory-fit chunks, sort each, then multiway-merge.

外部排序演示

块1:
84
块2:
57
块3:
13
块4:
62
当前阶段:分块排序

Common approach

Multiway merge sort

  • Split into chunks, load each into memory and sort (e.g., merge sort).
  • Merge sorted chunks (“runs”) via multiway merge.
  • Use 2-way, 4-way, or multiway merging to reduce passes.

Scenarios

  • Large DBs, search logs, big-data analytics
  • Any sort where the whole dataset cannot fit in memory

Complexity

  • Sorting phase: O(nlogn)O(n\log n) (per chunk)
  • Merge phase: O(nlogkn)O(n\log_k n), kk = merge ways
  • Overall: O(nlogn)O(n\log n)
  • Space: dominated by disk I/O and buffer sizes

Exercises

Exercise 1

Describe the basic flow and use cases of external sorting.

参考答案

思路:分块排序 + 多路归并,适合大数据。

步骤

  1. 分块排序,生成归并段
  2. 多路归并成大文件
  3. 适用于内存放不下的场景

答案:分块+归并,适合大数据排序。

Exercise 2

Common merge ways and their complexity?

参考答案

思路:二路/多路归并与复杂度。

步骤

  1. 二路、四路、多路归并
  2. 复杂度 O(nlogkn)O(n\log_k n)kk 为路数

答案:常用多路归并,复杂度 O(nlogkn)O(n\log_k n)

搜索