程式扎記: [ In Action ] The collective Groovy datatypes

Preface:
For the time being, assume that
you can work on a large string. You have numerous ways of splitting this string into words. But how do you count and store the word frequencies? You cannot have a distinct variable for each possible word you encounter. Finding a way of storing frequencies in a list is possible but inconvenient—more suitable for a brain teaser than for good code. Maps come to the rescue.

Some pseudocode to solve the problem could look like this:

view plaincopy to clipboardprint?
for each word {  
    if (frequency of word is not known)  
        frequency[word] = 0  
    frequency[word] += 1  
}  

This looks like the list syntax, but with strings as indexes rather than integers. In fact, Groovy maps appear like lists, allowing any arbitrary object to be used for indexing. In order to describe the map datatype, we show how maps can be specified, what operations and methods are available for maps, some surprisingly convenient features of maps, and, of course, a map-based solution for the word-frequency exercise.

Specifying maps:
The specification of maps is analogous to the list specification that you saw in the previous section. Just like lists, maps make use of the subscript operator to retrieve and assign values. The difference is that maps can use any arbitrary type as an argument to the subscript operator, where lists are bound to integer indexes. Whereas lists are aware of the sequence of their entries, maps are generally not. Specialized maps like java.util.TreeMap may have a sequence to their keys, though.

Simple maps are specified with square brackets around a sequence of items, delimited with commas. The key feature of maps is that the items are key-value pairs that are delimited by colons:

view plaincopy to clipboardprint?
def map = [key1:value1, key2:value2, key3:value3]  

In principle, any arbitrary type can be used for keys or values. When using exotic types for keys, you need to obey the rules as outlined in the Javadoc for java.util.Map.

The character sequence [:] declares an empty map. Maps are by default of type java.util.HashMap and can also be declared explicitly by calling the respective constructor. The resulting map can still be used with the subscript operator. In fact, this works with any type of map, as you see in listing 4.11 with type java.util.TreeMap.
- Listing 4.11 Specifying maps

view plaincopy to clipboardprint?
def myMap = [a:1, b:2, c:3]  
assert myMap instanceof HashMap  
assert myMap.size() == 3  
assert myMap['a']   == 1  
def emptyMap = [:]  
assert emptyMap.size() == 0  
def explicitMap = new TreeMap()  
explicitMap.putAll(myMap)  
assert explicitMap['a'] == 1  

For the common case of having keys of type String , you can leave out the string markers (single or double quotes) in a map declaration:

view plaincopy to clipboardprint?
assert ['a':1] == [a:1]  

Such a convenience declaration is allowed only if the key contains no special characters (it needs to follow the rules for valid identifiers) and is not a Groovy keyword.

This notation can also get in the way when, for example, the content of a local variable is used as a key. Suppose you have local variable x with content 'a' . Because [x:1]is equal to ['x':1] , how can you make it equal to ['a':1] ? The trick is that you can force Groovy to recognize a symbol as an expression by putting it inside parentheses:

view plaincopy to clipboardprint?
def x = 'a'  
assert ['x':1] == [x:1]  
assert ['a':1] == [(x):1]  

It’s rare to require this functionality, but when you need keys that are derived from local symbols (local variables, fields, properties), forgetting the parentheses is a likely source of errors.

Using map operators:
The simplest operations with maps are storing objects in the map with a key and retrieving them back using that key. Listing 4.12 demonstrates how to do that. One option for retrieving is using the subscript operator. As you have probably guessed, this is implemented with map’s getAt method. A second option is to use the key like a property with a simple dot-syntax. You will learn more about properties in chapter 7. A third option is the get method, which additionally allows you to pass a default value to be returned if the key is not yet in the map. If no default is given, null will be used as the default. If on a get(key,default) call the key is not found and the default is returned, the key:default pair is added to the map.
- Listing 4.12 Accessing maps (GDK map methods)

view plaincopy to clipboardprint?
def myMap = [a:1, b:2, c:3]  
// Retrieve existing elements  
assert myMap['a']       == 1     
assert myMap.a          == 1     
assert myMap.get('a')   == 1     
assert myMap.get('a',0) == 1  
  
// Attempt to retrieve missing elements     
assert myMap['d']       == null     
assert myMap.d          == null     
assert myMap.get('d')   == null  
// Supply a default value     
assert myMap.get('d',0) == 0     
assert myMap.d          == 0  
  
// Simple assignments in the map     
myMap['d'] = 1          
assert myMap.d == 1     
myMap.d = 2             
assert myMap.d == 2  

Assignments to maps can be done using the subscript operator or via the dot-key syntax. If the key in the dot-key syntax contains special characters, it can be put into string markers, like so:

view plaincopy to clipboardprint?
myMap = ['a.b':1]  
assert myMap.'a.b' == 1  

Just writing myMap.a.b would not work here—that would be the equivalent of calling myMap.getA().getB(). Listing 4.13 shows how information can easily be gleaned from maps, largely using core JDK methods from java.util.Map .
- Listing 4.13 Query methods on maps

view plaincopy to clipboardprint?
def myMap = [a:1, b:2, c:3]  
def other = [b:2, c:3, a:1]  
  
// Normal JDK methods  
assert myMap == other   // Call to equals  
assert myMap.isEmpty()  == false                         
assert myMap.size()     == 3                             
assert myMap.containsKey('a')                            
assert myMap.containsValue(1)                            
assert myMap.keySet()        == toSet(['a','b','c'])     
assert toSet(myMap.values()) == toSet([1,2,3])           
assert myMap.entrySet() instanceof Collection  
  
// 1) Methods added by GDK            
assert myMap.any   {entry -> entry.value > 2  }     
assert myMap.every {entry -> entry.key   < 'd'}  
  
// Utility method used for assertions     
def toSet(list){                    
    new java.util.HashSet(list)     
}    

With the information about the map, we can iterate over it in a number of ways: over the entries, or over keys and values separately. Because the sets that are returned from keySet and entrySet are collections, we can use them with the for-in-collection type loops. Listing 4.14 goes through some of the possible combinations.
- Listing 4.14 Iterating over maps (GDK)

view plaincopy to clipboardprint?
def myMap = [a:1, b:2, c:3]  
  
// Iterate over entries  
def store = ''  
myMap.each {entry ->         
    store += entry.key       
    store += entry.value     
}                            
assert store.contains('a1')         
assert store.contains('b2')  
assert store.contains('c3')  
  
// Iterate over keys/values  
store = ''  
myMap.each {key, value ->     
    store += key              
    store += value            
}                             
assert store.contains('a1')  
assert store.contains('b2')  
assert store.contains('c3')  
  
// Iterate over just the keys  
store = ''  
for (key in myMap.keySet()) {     
    store += key                  
}                                 
assert store.contains('a')  
assert store.contains('b')  
assert store.contains('c')  
  
// Iterate over just the values  
store = ''  
for (value in myMap.values()) {     
    store += value                  
}                                   
assert store.contains('1')  
assert store.contains('2')  
assert store.contains('3')  

Finally, map content can be changed in various ways, as shown in listing 4.15. Removing elements works with the original JDK methods. New capabilities that the GDK introduces are as follows:

■ Creating a subMap of all entries with keys from a given collection
■ findAll entries in a map that satisfy a given closure
■ find one entry that satisfies a given closure, where unlike lists there is no notion of a first entry, because there is no ordering in maps
■ collect in a list whatever a closure returns for each entry, optionally adding to a given collection

- Listing 4.15 Changing map content and building new objects from it

view plaincopy to clipboardprint?
def myMap = [a:1, b:2, c:3]  
myMap.clear()  
assert myMap.isEmpty()  
myMap = [a:1, b:2, c:3]  
myMap.remove('a')  
assert myMap.size() == 2            
myMap = [a:1, b:2, c:3]  
def abMap = myMap.subMap(['a','b'])  // 1) Create a view onto the original map   
assert abMap.size() == 2  
abMap = myMap.findAll   { entry -> entry.value < 3}  
assert abMap.size() == 2  
assert abMap.a      == 1  
def found = myMap.find  { entry -> entry.value < 2}  
assert found.key   == 'a'  
assert found.value == 1  
  
def doubled = myMap.collect { entry -> entry.value *= 2}  
assert doubled instanceof List  
assert doubled.every    {item -> item %2 == 0}  
  
def addTo = []  
myMap.collect(addTo)    { entry -> entry.value *= 2}  
assert addTo instanceof List  
assert addTo.every      {item -> item %2 == 0}  

From the list of available methods that you have seen for other datatypes, you may miss our dearly beloved isCase for use with grep and switch . Don’t we want to classify with maps? Well, we need to be more specific: Do we want to classify by the keys or by the values? Either way, an appropriate isCase is available when working on the map’s keySet or values.

Maps in action:
Let's revisit our initial example of counting word frequencies in a text corpus. The strategy is to use a map with each distinct word serving as a key. The mapped value of that word is its frequency in the text corpus. We go through all words in the text and increase the frequency value of that respective word in the map. We need to make sure that we can increase the value when a word is hit the first time and there is no entry yet in the map. Luckily, the get(key,default) method does the job.
- Listing 4.16 Counting word frequency with maps

view plaincopy to clipboardprint?
def textCorpus =   
"""  
Look for the bare necessities  
The simple bare necessities  
Forget about your worries and your strife  
I mean the bare necessities  
Old Mother Nature's recipes  
That bring the bare necessities of life with bare interest  
"""  
def words = textCorpus.tokenize()  
def wordFrequency = [:]  
words.each { word ->  
    wordFrequency[word] = wordFrequency.get(word,0) + 1     // 1)  
}  
def wordList = wordFrequency.keySet().toList()  
wordList.sort { wordFrequency[it] }                         // 2)  
def statistic = "\n"  
wordList[-1..-6].each { word ->  
    statistic += word.padLeft(12)    + ': '  
    statistic += wordFrequency[word] + "\n"  
}  
  
assert statistic ==  
"""  
        bare: 5  
necessities: 4  
         the: 3  
        your: 2  
    interest: 1  
        with: 1  
"""  

For (2) part, Having the sort method on the wordList accept a closure turns out to be very beneficial, because it is able to implement its comparing logic on thewordFrequency map—on an object totally different from the wordList .

程式扎記

標籤

2014年1月21日星期二

[ In Action ] The collective Groovy datatypes - Working with maps

沒有留言:

張貼留言

[Git 常見問題] error: The following untracked working tree files would be overwritten by merge

檢舉濫用情形

學習筆記

標籤

2014年1月21日 星期二