External Sorting
Idea
External sorting handles data too large for memory, using external storage (disk). Split the big file into memory-fit chunks, sort each, then multiway-merge.
外部排序演示
块1:
84
块2:
57
块3:
13
块4:
62
当前阶段:分块排序
Common approach
Multiway merge sort
- Split into chunks, load each into memory and sort (e.g., merge sort).
- Merge sorted chunks (“runs”) via multiway merge.
- Use 2-way, 4-way, or multiway merging to reduce passes.
Scenarios
- Large DBs, search logs, big-data analytics
- Any sort where the whole dataset cannot fit in memory
Complexity
- Sorting phase: (per chunk)
- Merge phase: , = merge ways
- Overall:
- Space: dominated by disk I/O and buffer sizes
Exercises
Exercise 1
Describe the basic flow and use cases of external sorting.
参考答案
思路:分块排序 + 多路归并,适合大数据。
步骤:
- 分块排序,生成归并段
- 多路归并成大文件
- 适用于内存放不下的场景
答案:分块+归并,适合大数据排序。
Exercise 2
Common merge ways and their complexity?
参考答案
思路:二路/多路归并与复杂度。
步骤:
- 二路、四路、多路归并
- 复杂度 , 为路数
答案:常用多路归并,复杂度 。