MapReduce

keywords: Functional programming, programming model

Map: specify a map function to generate intermediate key-value pairs based on primal key-value pairs. Reduce function: merge all intermediate values associated with the same intermediate key. Functional style programming allows users to utilize the resources of a large distributed system in natural. (the key-value data structure allow parallelism in the perspective of data splitting.)

1. Execution overview

The map invocation are distributed across multiple machines by partitioning the input data into a set of M splits. The input splits can be processed in parallel by different machines. Reduce invocations are distributed by partitioning the intermediate key space into R pieces using a partitioning function. (Recall 一致hash)

Step1: The user defined MapReduce library first split the input file into M pieces. The size of each part of the map section depends on the hardware. The whole application will then starts up many copies of the program on a cluster of machines.

Step2: One of the copies is called master, which assign work to “workers”. Recall that we have M map tasks and R reduce tasks. Each worker has three status, idle, in-progress, or completed. The master picks the idle workers and assign each one a map task or a reduce task.

Step3: A worker who is assigned a map task reads the contents of the corresponding input split. It parses key/value pairs out of th input data and passes each pair to the user-defined map function. The intermediate key/value pairs produced by the Map function are buffered in memory. Periodically, the buffered pairs are written to local disk, partitioned into R regions by the partitioning function. (Recall the differences between spark and Hadoop) The location of these R partitions on local disk are passed back to the master, who is responsible for forwarding these locations to the reduce workers.

Step4: When the reduce worker is notified by the master about these locations, it uses remote procedure calls to read the buffered data from the local disks of the map workers. (感觉调度问题会是个麻烦) When a reduce worker has read all intermediate data, it sorts it by the intermediate keys so that all occurrences are grouped together. (可能会用到虚拟内存)

Step5: The reduce worker iterates over the sorted intermediate data and for each unique intermediate key encountered, it passes the key and the corresponding set of intermediate values to the user’s Reduce function. (这里隐含地说明每个reduce worker的工作是处理给定的intermediate key)

Step6: When all map tasks and reduce tasks have been completed, the master wakes up the user program.

Remark: The data structure of the master is quite special. It restores the state of the task (idle, in-progress, complete) as well as the identity of the worker machine. The master should also restore the location of R partitions (习题:为什么不用保存M个data split的地址?自证不难). The information of the location is pushed to workers that have in-progress reduce tasks. (Keep an eye on the granularity!)

2. Fault tolerance

The library must tolerate machine failures gracefully.

2.1 The identification of worker failure

The master pings every worker periodically. If no response is received from a worker in a certain amount of time, the master marks the worker as failed. Any map tasks completed by the worker are reset back to their initial idle state, and therefore become eligible for scheduling on other workers. Similarly, any map task or reduce task in progress on a failed worker is also reset to idle and become eligible for rescheduling. Note that completed map tasks are re-executed on a failure because their output is stored on the local disks of the failed machine and is therefore inaccessible. Completed reduce tasks do not need to be re-executed since their output is stored in a global file system.(再次说明map步骤和reduce步骤的关系)

2.2 The failure of master

Write periodic checkpoints!

Technical debt in machine learning system 进程管理

Comments

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×