程式扎記

Source From Here
Preface
If you capture packets using tcpdump directly from the server, your capture file may contain bad checksums. This is because your OS is currently configured to use the hardware checksum offloading feature of the NIC. When this feature is enabled, expecting the NIC to rewrite the checksums, OS doesn't bother to fill (nor to reset) in the checksum fields. The problem is that tcpdump is capturing the packets before the checksums are rewritten by the NIC.

How-To
Use the following command to turn off the checksum offloading before using tcpdump (on ubuntu).

# sudo ethtool -K eth0 rx off tx off

If you already have a capture file not usable due to the wrong checksums, use the following command to repair the file.

$ sudo tcpreplay -i eth0 -F -w output.cap input.cap

Or

$ sudo tcprewrite -i input.cap -o output.cap -C

Preface
In this exercise you will import data from a relational database using Sqoop. The data you load here will be used subsequent exercises.

Lab Experiment
Consider the MySQL databases movielens, derived from the MovieLens project from University of Minnesota. (See note at the end of this exercise.) The database consists of several related tables, but we will import only two of these: movie, which contains about 3,900 movies; and movierating, which has about 1,000,000 ratings of those movies.

Review the Database Tables
First, review the database tables to be loaded into Hadoop:
1. Log on to MySQL:

$ mysql --user=training --password=training movielens
...
mysql> # Now we are in MySQL interactive console

2. Review the structure and contents of the movie table:

3. Note the column names for the table
4. Review the structure and contents of the movierating table:

5. Note these column names
6. Exit mysql

mysql> quit

Import with Sqoop
You invoke Sqoop on the command line to perform several commands. With it you can connect to your database server to list the databases (schemas) to which you have access, and list the tables available for loading. For database access, you provide a connect string to identify the server, and if required - your username and password.

1. Show the commands available in Sqoop

$ sqoop help

2. List the databases (schemas) in your database server:

$ sqoop list-databases \
--connect jdbc:mysql://localhost \
--username training --password training
14/11/30 00:06:54 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
14/11/30 00:06:54 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
information_schema
dualcore
hue
metastore
movielens
mysql
test
training

Note: Instead of entering --password training on your command line, you may prefer to enter -P, and let Sqoop prompt you for the password, which is then not visible when you type it.

3. List the tables in the movielens database:

$ sqoop list-tables \
--connect jdbc:mysql://localhost/movielens \
--username training --password training
...
genre
movie
moviegenre
movierating
occupation
user

4. Import the movie table into Hadoop:

$ sqoop import \
--connect jdbc:mysql://localhost/movielens \
--username training --password training \
--fields-terminated-by '\t' --table movie
...
14/11/30 00:23:11 INFO mapreduce.ImportJobBase: Transferred 99.6602 KB in 18.9969 seconds (5.2461 KB/sec)
14/11/30 00:23:11 INFO mapreduce.ImportJobBase: Retrieved 3881 records.

5. Verify that the command has worked:

$ hadoop fs -ls movie
$ hadoop fs -tail movie/part-m-00000

6. Import the movierating table into Hadoop
- Step6.sh

view plaincopy to clipboardprint?
#!/bin/sh  
Table="movierating"  
echo "Start importing table movierating..."  
sqoop import \  
--connect jdbc:mysql://localhost/movielens \  
--username training --password training \  
--fields-terminated-by '\t' --table "$Table"  
  
echo "Check importing result..."  
Result=`hadoop fs -ls $Table 2>&1`  
if [[ $Result =~ "No such file or directory" ]]; then  
    echo "Importing Table=$Table fail!"  
else  
    echo "Check import content:"  
    hadoop fs -tail movierating/part-m-00000  
fi  

Supplement
* Apache Sqoop Home Page

Apache Sqoop(TM) is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases.

* Sqoop 1.4.3 User Guide

Source From Here
Preface
在 迭代器與程式區塊 中談過，每次呼叫方法時，其實會涉及四個部份：

* 訊息接收者
* . 運算
* 訊息名稱
* 程式區塊

程式區塊與 Proc
如果你在方法最後一個參數設定 & 開頭的參數，則會使用區塊建立 Proc 物件傳入，Proc 有個 call 方法，用以執行 Proc 物件內含的程序。例如：

你可以使用 Proc.new 建立程序物件。例如:

>> p = Proc.new { |param| puts param }
=> #
>> p.call(123)
123
=> nil

Proc.new 會使用指定的程式區塊建立物件，呼叫 call 時，就是執行程式區塊指定的程式流程。要注意的是，Proc 物件是 Proc 物件，程式區塊是程式區塊，兩者根本上不同，程式區塊是定義方法時語法的一部份，呼叫方法時指定程式區塊，如果方法最後有個 & 參數，Ruby直譯器會使用方法上指定的程式區塊來建立 Proc 物件。例如:

view plaincopy to clipboardprint?
foreach([1, 2, 3]) { |element| puts element }  

要用程式流程來示意的話，Ruby 會使用以下建立 Proc 物件:

view plaincopy to clipboardprint?
p = Proc.new { |element| puts element }  

再呼叫方法:

view plaincopy to clipboardprint?
foreach([1, 2, 3], &p)  

注意最後那個 &，這表示將 p 傳給方法最後一個 & 參數，少了那個 &，那麼 p 就只會是一般的方法呼叫引數:

所以任何可以接受程式區塊的方法，如果想要自行建立 Proc 物件傳入，都要加上個 &。例如你有個想重用的程序，則可以使用 Proc 而不是程式區塊：

>> puts_proc = Proc.new { |element| puts element }
=> #
>> [1, 2, 3].each(&puts_proc)
1
2
3
=> [1, 2, 3]
>> "abc".each_char(&puts_proc)
a
b
c
=> "abc"

不過如下指定就錯了，因為方法最後一個參數不知道該使用傳入的 Proc，還是捕捉程式區塊而建立的 Proc：

實際上，& 會觸發物件的 to_proc 方法，並嘗試指定給 & 變數，你可以在任何物件上定義 to_proc 方法，然後使用 & 來觸發 to_proc 方法。例如:

view plaincopy to clipboardprint?
class Ball  
    attr_reader :radius  
  
    def initialize(radius)  
        @radius = radius  
    end  
  
    def self.to_proc  
        Proc.new { |ball| ball.radius }  
    end  
end  
  
# 收集球的半徑  
print [Ball.new(10), Ball.new(20), Ball.new(30)].collect(&Ball) # [10, 20, 30]  

例如 Symbol 上就定義有 to_proc 方法，若有個程式是如下:

>> ["justin", "monica"].each { |name| name.capitalize! }
=> ["Justin", "Monica"]

則可以改用以下:

>> :capitalize.to_proc.call("orz")
=> "Orz"
>> ["justin", "monica"].each(&:capitalize!)
=> ["Justin", "Monica"]
>> :not_exist.to_proc.call("test")
NoMethodError: undefined method `not_exist' for "test":String # String 物件上沒有方法 "not_exist"

有些方法可以直接傳入 Symbol 的，也是類似的道理。例如陣列的 reduce 方法，為了方便，甚至設計為可省略 &：

>> [1, 2, 3].reduce { |sum, element| sum += element }
=> 6
>> [1, 2, 3].reduce(&:+)
=> 6
>> [1, 2, 3].reduce(:+)
=> 6

實際上，Symbol 的設計大致是:

view plaincopy to clipboardprint?
class Symbol  
    def to_proc  
        Proc.new { |o| o.send(self) }  
    end  
end  

因此總能找出正確的回應方法來執行。

Proc 的 call 方法可以接受任意引數，不過實際上你可以取得幾個引數，在於你定義了幾個區塊參數。例如:

Proc 正如其名，是一小段程序，一小段流程，要注意若在建立 Proc 時的程式區塊 return 時的狀況。例如:

注意到並沒有顯示 "some 2"，因為上例相當於:

view plaincopy to clipboardprint?
def some  
    puts "some 1"  
    puts "執行 Proc"  
    return  
    puts "some 2"  
end  

上例中，Proc 是在 some 的作用範圍中建立，如果 Proc 沒有在作用範圍中建立，建立 Proc 時的程式區塊中若有 return，則會引發 LocalJumpError:

因為設計 API 時，並不希望有 return 中斷了原本 API 的執行流程，因此 Ruby 執行時如果看到 return，就會視為錯誤，即使 return 的目的是正常結束並傳回值，如果你確實是想傳回值，可以不撰寫 return，因為 Ruby 執行流程中最後一個物件就會被當作傳回值。例如:

因為 Proc 像是個執行流程而不是方法，除了要注意 return 之外，迭代器與程式區塊 中也提到，要注意程式區塊中撰寫了 break、next 或 redo 的結果。

Source From Here
Preface
今天跟大家介紹 Monad，這個令人生畏的單字。不過不用怕，我跟大家一樣，都是不懂它後面那複雜的數理，那 Lambda Calculus。儘管如此，我還是能夠在日常的開發裏使用它，從它得到不少好處。本文將透過實作來闡釋 Monad，也會解釋，對於 Java 開發者來說，它到底有什麼幫助。

Example 1: Optional

view plaincopy to clipboardprint?
// 從 Account 裏取出居住的城市名稱  
String getCityName(Account account) {  
  if (account != null) {  
    if (account.getAddress() != null) {  
      if (account.getAddress().getCity() != null) {  
        return account.getAddress().getCity().getName();  
      }  
    }  
  }  
  return "Unknown";  
}  

這程式很簡單，就是從 Account 裏逐一取出城市的名稱，為了避免 NullPointerException，我們必須一層一層用 if 檢查 null。這段程式碼有幾個潛在的問題：

* 重複：不僅僅是 if != null，像是 getAddress() 也出現了三次
* null 檢查容易忘記。現在還好，但當程式碼開始複雜時就很容易出錯

有沒有辦法可以避免這些問題呢？我們先來試試 extract method:

view plaincopy to clipboardprint?
// 將檢查 null 的邏輯獨立抽出  
// map() 呼叫一個外部的轉換 function, 如果 value 不是 null   
// 的話，它會將 value 轉成 R。  
 R map(T value, Function transform) {  
  if (value != null) {  
    return transform.apply(value);  
  }  
  return null;  
}  
  
//用 map() 改寫後  
String getCityName(Account input) {  
  Address address = map(input, account -> account.getAddress());  
  City city = map(address, a -> a.getCity());  
  String name = map(city, c -> c.getName());  
  if (name != null) {  
    return name;  
  }  
  return "Unknown";  
}  

這裏將 null 的檢查抽成獨立的 method map()，它接受一個外來的轉換 Function。在這個例子裡，轉換 Function 可以用來取值。上面的 getCityName 改寫後，你可以看到 map 後面接的都是一個取值的 lambda。

hmmm.... 這改寫是有去除掉一些重複的程式，但沒有好很多，因為它還是沒有解決 null 容易遺漏的問題。如果真要保證不會遺漏，最好是 compile 時期就能發現。我們不能變更 Java 語言，不過物件導向給了我們自訂型別的能力 -- 型別不對 compile 就不會過!

我們繼續重構，這一次抽象化一個特殊的容器 Optional，包裝這個反復出現 null 檢查：

view plaincopy to clipboardprint?
class Optional {  
    //容器內存著一個值，有時是 null  
    private final Object value;  
  
    Optional(Object value) {  
        this.value = value;  
    }  
  
    // map() 呼叫一個外部的轉換 function, 如果 value 不是 null  
    // 的話，它會將 value 轉成 R，再用新的容器包一次傳出去。  
    Optional map(transfer) {  
        if (value != null) {  
            return new Optional(transfer(value));  
        }  
        return new Optional(null);  
    }  
  
    //方便的 method  
    Object orElse(defaultValue) {  
        return value != null ? value : defaultValue;  
    }  
}  

上面是一個簡單的 Optional 實作，它裡面可以放一個值 value。他也提供一個 map() method，可以安全地將內部的 value 轉換成其他值。有了這個容器，來看看重構後的程式：

view plaincopy to clipboardprint?
String getCityName2(Account inputAccount) {  
    Optional optAccount = new Optional(inputAccount);  
    Optional optAddress = optAccount.map({account -> account.address});  
    Optional optCity = optAddress.map({address -> address.city});  
    Optional optName = optCity.map({city -> city.getName()});  
    return optName.orElse("Unknown");  
}  
  
account = new Account(address=new Address(city=new City(name="Taipei")))  
account2 = new Account(address=new Address(city=null))  
printf("City name=%s\n", getCityName2(account))  
printf("City name=%s\n", getCityName2(account2))  

現在每一個過渡物件都有用 Optional 包起來了，如果有人用到這些物件，他們必須透過 map() 或是 orElse() 這些保護過的 method 才能取到裡面的值，因此可以避免NullPointerException。Optional.map() 設計成回傳 Optional ，因此可以連串的呼叫，我們可以改寫的更簡潔：

view plaincopy to clipboardprint?
String getCityName3(Account inputAccount) {  
    return new Optional(inputAccount)  
        .map({account -> account.address})  
        .map({address -> address.city})  
        .map({city -> city.name})  
        .orElse("Unknown");  
}  

Optional 這個精心設計的容器，解決了上面提到的問題：

* 去除了重複：重複的 if null 檢查被封在 map() 裏
* 利用型別的規範，在 compile 時期就能避免遺漏 null 檢查
* 由於抽象成一獨立 class，我們有機會加入 orElse() 這樣好用的 method

目前為止，Optional 這樣的容器，很像我們今天要討論的 Monad 了，讓我們看更多的例子來進一步了解。

Example 2: Transactional

view plaincopy to clipboardprint?
def transfer(Account account1, Account account2, int m) {  
    database.beginTransaction(); //開啟資料庫的交易  
  
    try {  
        account1.withdraw(m); //提錢  
        try {  
            account2.deposit(m); //存錢  
        } catch (Exception e) {  
            database.rollback(); //放棄，恢復資料庫  
            return;  
        }  
    } catch (InsufficientBalanceException e) {  
        System.err.printf("\t[Error] %s\n", e)  
        database.rollback(); //放棄，恢復資料庫  
        return;  
    }  
    if (!database.isRollback()) {  
        database.commit(); //最後都沒異常才會進資料庫  
    }  
}  

第二個例子是個典型的銀行轉帳，account1 提領 m 元，再存入 account2。上面的程式有資料庫的操作，當提領錢不夠的話就會 rollback，而存錢時有異常也是。這樣的程式也有類似第一個範例的問題：

* 程式碼重複，try catch (exception) {rollback} 出現兩次，而且程式很醜
* 接到 exception 一定要作 rollback。但這太容易忘了

我們來套套看剛才 Optional 範例中學到的解法:

view plaincopy to clipboardprint?
class Transactional {             
    // 資料庫交易開始  
    static Transactional begin() {  
        database.beginTransaction();  
        return new Transactional(TxState.BEGIN);  
    }  
  
    private final TxState txState;  
    static Database database=new Database()  
  
    Transactional(TxState txState) {  
        this.txState = txState;  
    }  
  
    // 這裏會根據傳入的 transform Function 的行為，  
    // 對資料庫做不同的操作  
    Transactional map(transform) {  
        // 如果當前的交易狀態不是已開始，直接跳過  
        if (txState != TxState.BEGIN)   
        {  
            return this;  
        }  
        try {  
            //執行外部的邏輯  
            TxState result = transform(txState);  
            return new Transactional(result);  
        } catch (TransactionException e) {  
            System.err.printf("\t[Error] %s\n", e)  
            database.rollback(); //transform 如果出錯，放棄交易  
            return new Transactional(TxState.ROLLBACK);  
        }  
    }  
  
    // 如果交易的狀態是已經開始，就對資料庫下 commit。  
    // 反之則跳過不做事。  
    Transactional commit() {  
        return map({state ->   
            database.commit();  
            return TxState.COMMIT;  
        });  
    }  
}  

這裏我們設計了一個容器 Transactional，它紀錄目前的資料庫交易的狀態 TxState。而隨著程式的進行，txState 會一直轉換，同時也會對資料庫操作。Transactional.map() 這個 method 裏則包含了 catch 到 exception 後 rollback 的邏輯。靠這個新容器的幫忙，重構後的程式變為:

view plaincopy to clipboardprint?
void transfer2(Account account1, Account account2, int m) {  
    Transactional.begin()  
            .map({txState ->  
                account1.withdraw(m);  
                return txState;  
            })  
            .map({txState ->  
                account2.deposit(m);  
                return txState;  
            })  
            .commit();  
}  

重構後的程式變成另一番氣象了，withdraw 和 deposit 都是寫在 map() 的 lambda 裏，巢狀的 try catch 不見了。如果 withdraw(m) throw exception 時，則包含 deposit(m) 的那個 lambda 會直接跳過，而 commit() 也不會做事。你可以花點時間在腦中跑一輪，體會一下這個設計。至於 lambda 裏的 return txState; 可以暫不理會。

Transactional 這個新容器解決了：

* 去除重複：重複的 try catch { rollback } 被封在 map() 裏
* 程式更簡潔，比原本醜不拉嘰的 try catch block 好多了
* 利用型別的規範，在 compile 時期避免遺漏 rollback
* 容器內包含的 TxState，它的變更順序有嚴謹的規範 (這個小範例已經帶入 state machine 的觀念了)
* 加入 commit() 這好用的 method

Transactional 這個例子比較複雜，但相對的，套入容器的概念後，我們獲得的好處更多，有一種遇強則強的感覺，這種好事在程式中是很少見的。再繼續深入探討之前，我們來整理一下兩個例子裡，它們容器的共同點：

* 裡面都有個狀態，會隨著 map() 的運算而改變
* 有一個 constructor 直接收一個初始的狀態
* 有一個 map(transform) 的 method，執行外部給的操作，這 method 本身則封裝了運算的邏輯，它幫我們去除了重複的程式。
* map() 也是回傳容器

可以想見 map() 是個關鍵的設計。現在我們知道它可以去掉 if、try catch 這樣的重複結構，不過如果要去除更複雜的結構，我們需要更強大的 flatMap。

flatMap －展開轉換
讓我們回到 Optional 的例子，我們剛才有看到取出城市名稱可以連鎖 map() 呼叫，不過隨著程式越寫越多，難免會出現包了兩層 Optional 的情況:

view plaincopy to clipboardprint?
class Account  
{  
    // get city 太常用所以寫了個可重用的 method  
    public Optional city() {  
        return new Optional(address).map({o->o.city});  
    }  
}  
  
String getCityName4(Account inputAccount) {  
    Optional optAccount = new Optional(inputAccount);  
      
    //想重用 account.city() 結果出現雙層 Optional  
    Optional optOptCity = optAccount.map({account -> account.city()});  
      
    //只好連續 map 兩次硬生生展開 Optional，好噁...  
    Optional optName = optOptCity.map({optCity->optCity.map({o->o.name})})                          
    
    return optName.orElse("Unknown");  
}  

雙層 Optional 太瘋狂了，我們要有人幫我們展開 (flat) 其中一層，我們來實作一個 flatMap() 吧：

view plaincopy to clipboardprint?
class Optional {  
    ...  
    Optional flatMap(transform) {  
        if (value == null) return new Optional(null);  
        return transform(value);  
    }  
    ...  
}  
  
String getCityName5(Account inputAccount) {  
    // flatMap 那行的 Function generic 是：  
    //   Function>  
    return new Optional(inputAccount)  
        .flatMap({account -> account.city()})  
        .map({city -> city.getName()})  
        .orElse("Unknown");  
}  

好多了！多了 flatMap() 這個 method 後，就可以自由組合，不論 transform 回傳的結果有沒有包著容器。flatMap 相當的強大，我們的容器可以開始處理更複雜的結構，像是 for loop，來看看下一個範例。

Example 3: Stream
第三個例子，是收集一群帳號裡的所有台灣電話。因為每個 Account 都有多個電話，所以用巢狀的 for loop 收集

view plaincopy to clipboardprint?
List taiwanPhoneNumbers(List accounts) {  
    List numbers = new ArrayList<>();  
    for (Account account : accounts) {  
        for (Phone phone : account.getPhones()) {  
            if (phone.getNumber().startsWith("+886")) {  
                numbers.add(phone.getNumber());  
            }  
        }  
    }  
    return numbers;  
}  

我們來設計一個新容器 Stream 解決這個重複的運算結構:

view plaincopy to clipboardprint?
class Stream  
{  
    private List values  
    public Stream(List vals){values = vals}  
  
    Stream flatMap(transform)  
    {  
        def results = []  
        for(def value:values)  
        {  
            Stream transformed = transform(value)  
            for(def result:transformed.values)  
            {  
                results.add(result)  
            }  
        }  
        return new Stream(results)  
    }  
  
    Stream map(transform) {  
        // 注意：這裏只是 flatMap 和建構子的組合  
        return flatMap({value ->  
            new Stream(asList((transform(value))))});  
    }  
  
    // filter 對每個 T 值做判斷，Stream 中只留下判斷為 true 的值  
    Stream filter(predicate) {  
        // // 一樣只是 flatMap 和建構子的組合  
        return flatMap({value ->  
            if (predicate(value)) {  
                return new Stream(asList(value));  
            } else {  
                return new Stream(Collections.emptyList());  
            }  
        });  
    }  
  
    List toList(){return new ArrayList(values)}  
}  

這個陽春的 Stream 容器提供了建構子、flatMap()、以及 map() 和 filter() 四個功能。跟前面的範例不一樣，這一次我把主要的運算邏輯放在 flatMap 裏，你可以看出來 map() 和 filter() 其實只是 flatMap 和建構子的衍生物而已。使用新的 Stream 來重構原來的程式會變成:

view plaincopy to clipboardprint?
List taiwanPhoneNumbers2(List accounts) {  
    return new Stream(accounts)  
        .flatMap({account -> new Stream(account.getPhones())})  
        .map({phone -> phone.getNumber()})  
        .filter({number -> number.startsWith("+886")})  
        .toList();  
}  

重構後，程式的結構變很多，如果你覺得這裏的 flatMap 運用有點匪夷所思，建議你 trace 一下上面程式的執行。這裏特地用簡化過的 Stream 實作來幫助你了解 flatMap 的來龍去脈。套用 Stream 重構後，我們觀察到：

* 程式變成宣告式的運算：我們只宣告了要取值, 要留下台灣電話，這讓程式的意圖凸顯。
* 巢狀的for 以及 if 這些干擾讀程式的命令都被消除了
* 因為 Stream 容器，這允許我們加上 filter 這類的高階行為

宣告式不僅程式易讀，也增加了最佳化的可能性 (Java 8 裏真正的 Stream 效能和 for loop 一樣，可以參考 paper) 。

Monad Design Pattern
一共舉了三個範例，分別解決不同的運算問題，但是解法都是設計一個狀態的容器，加上 flatMap() method 來接受轉換的函式。具備這樣特徵的容器我們稱之為 Monad：

view plaincopy to clipboardprint?
class Monad {  
  // 建構子提供狀態的起始值，當然也可以寫成  
  // factory method，意思一樣就行  
  Monad(T state) {...}  
    
  // flatMap() 提供改變狀態，以及  Monad   
  // 的可組合性 (Composibility)  
   Monad flatMap(Function> transform) {  
     // 封裝反覆出現的運算  
     // ...   
  }  
  
  //map() 只是建構子和 flatMap() 的組合，算是 flatMap 的捷徑  
   Monad map(Function transform) {  
     return flatMap(state ->   
          new Monad(transform.apply(state));  
  }  
}  

如果我直接丟這個 Monad 的定義給你，那你看不懂是正常的，不過經過上面範例的洗禮，我相信現在會比較有感覺了。Monad 自然有其數理上的意義，但是對我們 Java 開發者來說，Monad 扮演的反而是個 Design Pattern，是一個我們開發時，時時可以借用的技巧。

Monad 的適用範圍
什麼時候適用 Monad 來解決問題呢？從上面範例的推導裏，相信大家已經有點概念了，我們重新整理成比較通用的規則：

* 你觀察到程式中，有個反覆出現的運算
* 運算常常會出現巢狀的結構
* 運算很容易寫錯，最好可以 compile 時就先抓到
* 運算過程中，某個值的狀態會改變

如果有幾項符合，那麼就可以試試 Monad 來解決。

Monad 的優點
同樣，範例中已經展現了 Monad 帶給我們的好處，我們總結一下

* 去除重複累贅的程式碼
* 將運算結構提升到型別這個層級，型別帶來的好處很多
--- Compile 時就能檢查出來
--- 將那散佈在程式碼各處的運算，集中並凸顯。
--- 封裝底層的實作
* 將 side effect 外包給 Monad，主程式只有重要的邏輯
* Monad 的 flatMap 是可以組合，連串呼叫的，程式碼易讀性好。
* 允許加上 domain 裏有意義，高階的 method

優點實在很多啊！重新整理一下範例，看得更清楚些:

Monad 的缺點
Monad 帶來的好處很多，但有光就有影，它最大的缺點是它是侵入式的，一旦開始採用後，你的 API 就會被迫改變：

view plaincopy to clipboardprint?
//原本 API 很乾淨的：  
City getCity() {...}  
  
//加上 Monad 的保護，API 非改不可，有時候這不是你想要的  
Optional city() {...}  
  
//如果出現了需要混用不同 Monad 的情境，就完了  
//試想，這 API 能看嗎?  
Transactional> tryCreateCity(String cityName) {...}  

當然如果語言本身就支援 Monad，那或許可以避免這樣的缺點。不過如果是在 Java, C#, Javascript 等等 OOP 語言下使用，那麼這個缺點還是會在，套用 Monad Pattern 時能夠避開就避開。

Example 4: Promise
希望現在你對 Monad 已經很有感覺了，也不會再遇到別人提到 Monad，而有聽沒有懂。這篇文的最後，我們來挑戰最後一個例子，它的運算結構很複雜 -- 就是惡名昭彰的 callback hell. 下面 crawlPage() 是個抓網頁的程式 -- 它抓個 html 網頁後就存檔，存檔完再寄信通知:

view plaincopy to clipboardprint?
//抓網頁存檔再寄信通知，連續 callback 的風格  
void crawlPage(String url) {  
  httpClient.getHtml(url, (String html) -> {  
    String fileName = "Page1.html";  
  
    fileUtil.writeFile(fileName, html, (Boolean success) -> {  
      String email = "my@gmail.com";  
  
      nailClient.sendEmail(email, (Boolean sent) -> {  
        System.out.println("result: " + sent);  
      });  
    });  
  });  
}  
  
//三個非同步的工具，它們的 method 都是接 callback：  
class HttpClient {  
  void getHtml(String url, Consumer callback) {  
    //download html text... then invoke callback  
  }  
}  
class FileUtil {  
  void writeFile(String fileName,   
                 String data,   
                 Consumer callback) {  
    //writing data to file... then invoke callback  
  }  
}  
class MailClient {  
  void sendEmail(String email, Consumer callback) {  
    //sending... then invoke callback  
  }  
}  

這個範例比較特別的是 getHtml()，writeFile()，sendEmail() 這三個 method 都是非同步的，它們裡面做完後，才會呼叫傳進的 callback。這樣巢狀 callback 的程式在 javascript 以及 Android 裏都很常見，三層還好，到了五層以上就會瘋了。前面的範例裏，Monad 解決了 if、for、try catch，那重複的 callback，能夠解開嗎？

Promise Monad

view plaincopy to clipboardprint?
class Promise {  
  
  private T value;  
  private Function pendingTransform;  
  private Promise chainPromise;  
  
  Promise(T value) {  
    this.value = value;  
  }  
  
  public  Promise map(Function transform) {  
    return flatMap(value -> new Promise<>(transform.apply(value)));  
  }  
  
  public  Promise flatMap(Function> transform) {  
    if (value != null) {  
      return transform.apply(value);  
    }  
    pendingTransform = transform;  
  
    Promise chainPromiseR = new Promise<>(null);  
    this.chainPromise = chainPromiseR;  
    return chainPromiseR;  
  }  
  
  public void complete(T value) {  
    if (pendingTransform == null) {  
       this.value = value;  
       return;  
    }  
    Promise promiseR =   
        (Promise) pendingTransform.apply(value);  
    promiseR.flatMap(nextValue -> {  
      chainPromise.complete(nextValue);  
      return null; //end of promise chain  
    });  
  }  
}  

Promise 代表的是對未來的承諾。在使用上你會先得到一個 Promise，裡面裝的東西可能還不存在，而你信任它之後會給你，因此你會先呼叫 Promise 上的 flatMap() 或是 map() ，預先接上 transform Function。等到它的值有了，transform Function 才會真的被呼叫到。

為了達到上述的需求，Promise 的相較於前面的 Monad，實作上複雜了點。在 flatMap 這個 method 裏，如果值還不存在的話，我們將 transform Function 先保留在pendingTransform 這個欄位，然後做一份居中的 chainPromise 先回傳。之後等值有了之後，complete() 這個 method 會被呼叫，這時才真的呼叫 transform Function，而它的結果再轉交回給 chainPromise。

雖然我做了簡單的解釋，不過中間的 chainPromise 繞來繞去的，不是很好懂，建議在腦中仔細 trace 一下整個流程。我們來看看利用 Promise 這個 Monad 重構後會是如何：

view plaincopy to clipboardprint?
class HttpClient {  
  void getHtml(String url, Consumer callback) {  
    //download html text... then invoke callback  
  }  
  //同樣功能的 method，但改成回傳 Promise 的版本  
  Promise getHtml(String url) {  
    Promise promise = new Promise<>(null);  
    //callback 回傳的結果轉交給 promise.complete  
    getHtml(url, html -> promise.complete(html));  
    return promise;  
  }  
}  
  
class FileUtil {  
  void writeFile(String name,   
                 String data,   
                 Consumer callback) {  
    //writing data to file... then invoke callback  
  }  
  Promise writeFile(String name, String data) {  
    Promise promise = new Promise<>(null);  
    //同上，用 method reference 更簡潔  
    writeFile(name, data, promise::complete);  
    return promise;  
  }  
}  
  
class MailClient {  
  void sendEmail(String email, Consumer callback) {  
    //sending... then invoke callback  
  }  
  Promise sendEmail(String email) {  
    Promise promise = new Promise<>(null);  
    sendEmail(email, promise::complete);  
    return promise;  
  }  
}  

我們將原本的三個工具 method 都加上了回傳 Promise 的版本，你可以發現三個的改法幾乎都一樣：先做好一個空的 Promise，再呼叫實際要做的 method，而該 method 的 callback 回來後直接呼叫 promise 的 complete() ，把 callback 的值交給 promise。

當工具都 Promise 化之後，爬網頁的程式就可以直接串接每個回傳的 Monad，整個平坦化：

view plaincopy to clipboardprint?
void crawlPage(String url) {  
  httpClient.getHtml(url)  
    .flatMap(html -> fileUtil.writeFile("Page1.html", html))  
    .flatMap(success -> mailClient.sendEmail("my@gmail.com"))  
    .map(emailSent -> {  
      System.out.println("result: " + emailSent);  
      return null;  
    });  
}  

重構後的程式行為跟原本的一樣，flatMap 和 map 接的 transform Function 裡面做的跟之前的 callback 沒兩樣，而且都是事後才會被呼叫。只是 Promise 設計的精巧，讓你可以避免巢狀的 callback。這個例子裡我們也看到了 Monad 對 API 的侵入。原本那三個工具程式都要 Promise 化，那個抓網頁的程式才能獲得好處! 底下是整個 Promise 使用 Groovy 改寫的完整範例代碼:

view plaincopy to clipboardprint?
class Promise  
{  
    def value = null  
    def pendingCallback = null  
    Promise chainPromise = null  
      
    Promise(v){value=v}  
      
    Promise map(callback)  
    {  
        return flatMap({value -> new Promise(callback(value))})  
    }  
      
    /** 
     * Register callback and return next Promise whose complete will be called after callback. 
     * @param callback 
     * @return 
     */  
    Promise flatMap(callback)  
    {  
        if (value != null)   
        {  
            return callback(value)  
        }  
        pendingCallback = callback  
  
        Promise chainPromiseR = new Promise(null)  
        this.chainPromise = chainPromiseR  
        return chainPromiseR  
    }  
      
    public void complete(value)  
    {  
        Promise promiseR = (Promise) pendingCallback(value)  
        if(promiseR!=null)  
        {  
            promiseR.flatMap({nextValue ->  
                chainPromise.complete(nextValue);  
                return null; //end of promise chain  
            })  
        }  
    }  
}  
  
class HttpClient {  
    void getHtml(String url, callback) {  
        //download html text... then invoke callback  
        Thread.start {  
            printf("Download html text... then invoke callback\n")  
            sleep(2000)  
            callback("html body")  
        }  
    }  
      
    //同樣功能的 method，但改成回傳 Promise 的版本  
    Promise getHtml(String url) {  
      Promise promise = new Promise(null);  
      //callback 回傳的結果轉交給 promise.complete  
      getHtml(url, {html -> promise.complete(html)});  
      return promise  
    }  
}  
  
class FileUtil {  
    void writeFile(String name,  
                   String data,  
                   callback) {  
      Thread.start{  
          printf("Write file=${name} with data=${data}...\n")  
          sleep(1000)  
          callback("write file done!")  
      }  
    }  
                     
    Promise writeFile(String name, String data) {  
      Promise promise = new Promise(null);  
      //同上，用 method reference 更簡潔  
      writeFile(name, data, {result->promise.complete(result)});  
      return promise;  
    }  
}  
  
class MailClient {  
    void sendEmail(String email, callback) {  
        //sending... then invoke callback  
        Thread.start{  
            printf("Sending to ${email}... then invoke callback\n")  
            sleep(1000)  
            callback("Done!")  
        }  
    }  
      
    Promise sendEmail(String email) {  
        Promise promise = new Promise(null);  
        sendEmail(email, {result->promise.complete(result)});  
        return promise;  
    }  
}  
  
httpClient = new HttpClient()  
fileUtil = new FileUtil()  
mailClient = new MailClient()  
  
void crawlPage(String url) {  
    httpClient.getHtml(url) // Will return promise object   
      .flatMap({html -> fileUtil.writeFile("Page1.html", html)})  
      .flatMap({success -> mailClient.sendEmail("my@gmail.com")})  
      .map({emailSent ->           
          System.out.println("Result: " + emailSent);  
        return null;  
      });  
}  
  
crawlPage("www.google.com.tw")  

執行結果如下:

Download html text... then invoke callback
Write file=Page1.html with data=html body...
Sending to my@gmail.com... then invoke callback
Result: Done!

Conclusion
Promise 這最後一個範例很複雜，但透過實際的實作，可以幫助各位更加了解 Monad 的能力。Promise 連 callback hell 都能征服了，我想幾乎沒有什麼結構難得倒 Monad 的。我希望各位讀完後，未來可以開始用 Monad 這個詞與其他開發者溝通，可以開始識別出某段程式套用了 Monad 來解決。最後更進一步，能夠自己用 Monad 解決問題。Monad 不是什麼很玄的東西，它只是：將重複的運算結構隱藏，透過凸顯的型別來強化程式 。

本文的實作範例雖然可以動，但都未達上線使用的標準，只能拿來作教學用途，不要傻傻的直接 copy 來用。JDK8 裏已經有現成的 Optional, Stream, CompletableFuture 可用，不必再發明輪子。參考資料：Mario Fusco 著的 Monadic Java ，本文的所有概念都是從這學來的。

程式扎記

標籤

2014年11月30日星期日

[Linux 文章收集] Tcpdump - how to fix the bad checksum problem

2014年11月29日星期六

[CCDH] Exercise17 - Importing Data With Sqoop (P60)

[ Ruby Gossip ] Advance : 可執行物件 - 程式區塊與 Proc

2014年11月28日星期五

[ Java 文章收集 ] Monad Design Pattern in Java

[Git 常見問題] error: The following untracked working tree files would be overwritten by merge

檢舉濫用情形

學習筆記

標籤

2014年11月30日 星期日