you're reading...

Notes of [MapReduce: simplified data …] by Dean & Ghemawat

1. So! What is MapReduce?

MapReduce is a two-step mechanism for manipulating distributed data with large scale. In particular, the ‘map’ step visits the data according to programmer-defined rules, then the ‘reduce’ step collects the intermediate results from ‘map’ and process them to produce the final result.

2. So! Why do we need MapReduce?

Because the data Google handles is of large scale and distributed across machines. Hence the conventional way of loading all the data necessary into the memory before the processing can start simply does not work.

3. So! Give me an example of how MapReduce work.

Say you are counting the number of a word in millions of web pages. The ‘map’ would go through these pages and fire a signal whenever it finds such words. Then the ‘reduce’ would collect lists of such signals and count them as a numeric value.


About Xiang 'Anthony' Chen

Making an Impact in Your Life


No comments yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Twitter Updates

%d bloggers like this: