程式扎記: [ In Action ] The collective Groovy datatypes - Working with maps

標籤

2014年1月21日 星期二

[ In Action ] The collective Groovy datatypes - Working with maps

Preface: 
For the time being, assume that 
you can work on a large string. You have numerous ways of splitting this string into words. But how do you count and store the word frequencies? You cannot have a distinct variable for each possible word you encounter. Finding a way of storing frequencies in a list is possible but inconvenient—more suitable for a brain teaser than for good code. Maps come to the rescue. 

Some pseudocode to solve the problem could look like this: 
  1. for each word {  
  2.     if (frequency of word is not known)  
  3.         frequency[word] = 0  
  4.     frequency[word] += 1  
  5. }  
This looks like the list syntax, but with strings as indexes rather than integers. In fact, Groovy maps appear like lists, allowing any arbitrary object to be used for indexing. In order to describe the map datatype, we show how maps can be specified, what operations and methods are available for maps, some surprisingly convenient features of maps, and, of course, a map-based solution for the word-frequency exercise. 

Specifying maps: 
The specification of maps is analogous to the list specification that you saw in the previous section. Just like lists, maps make use of the subscript operator to retrieve and assign values. The difference is that maps can use any arbitrary type as an argument to the subscript operator, where lists are bound to integer indexes. Whereas lists are aware of the sequence of their entries, maps are generally not. Specialized maps like java.util.TreeMap may have a sequence to their keys, though. 

Simple maps are specified with square brackets around a sequence of items, delimited with commas. The key feature of maps is that the items are key-value pairs that are delimited by colons: 
  1. def map = [key1:value1, key2:value2, key3:value3]  
In principle, any arbitrary type can be used for keys or values. When using exotic types for keys, you need to obey the rules as outlined in the Javadoc for java.util.Map

The character sequence [:] declares an empty map. Maps are by default of type java.util.HashMap and can also be declared explicitly by calling the respective constructor. The resulting map can still be used with the subscript operator. In fact, this works with any type of map, as you see in listing 4.11 with type java.util.TreeMap
- Listing 4.11 Specifying maps 
  1. def myMap = [a:1, b:2, c:3]  
  2. assert myMap instanceof HashMap  
  3. assert myMap.size() == 3  
  4. assert myMap['a']   == 1  
  5. def emptyMap = [:]  
  6. assert emptyMap.size() == 0  
  7. def explicitMap = new TreeMap()  
  8. explicitMap.putAll(myMap)  
  9. assert explicitMap['a'] == 1  
For the common case of having keys of type String , you can leave out the string markers (single or double quotes) in a map declaration: 
  1. assert ['a':1] == [a:1]  
Such a convenience declaration is allowed only if the key contains no special characters (it needs to follow the rules for valid identifiers) and is not a Groovy keyword. 

This notation can also get in the way when, for example, the content of a local variable is used as a key. Suppose you have local variable x with content 'a' . Because [x:1]is equal to ['x':1] , how can you make it equal to ['a':1] ? The trick is that you can force Groovy to recognize a symbol as an expression by putting it inside parentheses
  1. def x = 'a'  
  2. assert ['x':1] == [x:1]  
  3. assert ['a':1] == [(x):1]  
It’s rare to require this functionality, but when you need keys that are derived from local symbols (local variables, fields, properties), forgetting the parentheses is a likely source of errors. 

Using map operators: 
The simplest operations with maps are storing objects in the map with a key and retrieving them back using that key. Listing 4.12 demonstrates how to do that. One option for retrieving is using the subscript operator. As you have probably guessed, this is implemented with map’s getAt method. A second option is to use the key like a property with a simple dot-syntax. You will learn more about properties in chapter 7. A third option is the get method, which additionally allows you to pass a default value to be returned if the key is not yet in the map. If no default is given, null will be used as the default. If on a get(key,default) call the key is not found and the default is returned, the key:default pair is added to the map. 
- Listing 4.12 Accessing maps (GDK map methods) 
  1. def myMap = [a:1, b:2, c:3]  
  2. // Retrieve existing elements  
  3. assert myMap['a']       == 1     
  4. assert myMap.a          == 1     
  5. assert myMap.get('a')   == 1     
  6. assert myMap.get('a',0) == 1  
  7.   
  8. // Attempt to retrieve missing elements     
  9. assert myMap['d']       == null     
  10. assert myMap.d          == null     
  11. assert myMap.get('d')   == null  
  12. // Supply a default value     
  13. assert myMap.get('d',0) == 0     
  14. assert myMap.d          == 0  
  15.   
  16. // Simple assignments in the map     
  17. myMap['d'] = 1          
  18. assert myMap.d == 1     
  19. myMap.d = 2             
  20. assert myMap.d == 2  
Assignments to maps can be done using the subscript operator or via the dot-key syntaxIf the key in the dot-key syntax contains special characters, it can be put into string markers, like so: 
  1. myMap = ['a.b':1]  
  2. assert myMap.'a.b' == 1  
Just writing myMap.a.b would not work here—that would be the equivalent of calling myMap.getA().getB(). Listing 4.13 shows how information can easily be gleaned from maps, largely using core JDK methods from java.util.Map . 
- Listing 4.13 Query methods on maps 
  1. def myMap = [a:1, b:2, c:3]  
  2. def other = [b:2, c:3, a:1]  
  3.   
  4. // Normal JDK methods  
  5. assert myMap == other   // Call to equals  
  6. assert myMap.isEmpty()  == false                         
  7. assert myMap.size()     == 3                             
  8. assert myMap.containsKey('a')                            
  9. assert myMap.containsValue(1)                            
  10. assert myMap.keySet()        == toSet(['a','b','c'])     
  11. assert toSet(myMap.values()) == toSet([1,2,3])           
  12. assert myMap.entrySet() instanceof Collection  
  13.   
  14. // 1) Methods added by GDK            
  15. assert myMap.any   {entry -> entry.value > 2  }     
  16. assert myMap.every {entry -> entry.key   < 'd'}  
  17.   
  18. // Utility method used for assertions     
  19. def toSet(list){                    
  20.     new java.util.HashSet(list)     
  21. }    
With the information about the map, we can iterate over it in a number of ways: over the entries, or over keys and values separately. Because the sets that are returned from keySet and entrySet are collections, we can use them with the for-in-collection type loops. Listing 4.14 goes through some of the possible combinations. 
- Listing 4.14 Iterating over maps (GDK) 
  1. def myMap = [a:1, b:2, c:3]  
  2.   
  3. // Iterate over entries  
  4. def store = ''  
  5. myMap.each {entry ->         
  6.     store += entry.key       
  7.     store += entry.value     
  8. }                            
  9. assert store.contains('a1')         
  10. assert store.contains('b2')  
  11. assert store.contains('c3')  
  12.   
  13. // Iterate over keys/values  
  14. store = ''  
  15. myMap.each {key, value ->     
  16.     store += key              
  17.     store += value            
  18. }                             
  19. assert store.contains('a1')  
  20. assert store.contains('b2')  
  21. assert store.contains('c3')  
  22.   
  23. // Iterate over just the keys  
  24. store = ''  
  25. for (key in myMap.keySet()) {     
  26.     store += key                  
  27. }                                 
  28. assert store.contains('a')  
  29. assert store.contains('b')  
  30. assert store.contains('c')  
  31.   
  32. // Iterate over just the values  
  33. store = ''  
  34. for (value in myMap.values()) {     
  35.     store += value                  
  36. }                                   
  37. assert store.contains('1')  
  38. assert store.contains('2')  
  39. assert store.contains('3')  
Finally, map content can be changed in various ways, as shown in listing 4.15. Removing elements works with the original JDK methods. New capabilities that the GDK introduces are as follows: 
■ Creating a subMap of all entries with keys from a given collection
■ findAll entries in a map that satisfy a given closure
■ find one entry that satisfies a given closure, where unlike lists there is no notion of a first entry, because there is no ordering in maps
■ collect in a list whatever a closure returns for each entry, optionally adding to a given collection

- Listing 4.15 Changing map content and building new objects from it 
  1. def myMap = [a:1, b:2, c:3]  
  2. myMap.clear()  
  3. assert myMap.isEmpty()  
  4. myMap = [a:1, b:2, c:3]  
  5. myMap.remove('a')  
  6. assert myMap.size() == 2            
  7. myMap = [a:1, b:2, c:3]  
  8. def abMap = myMap.subMap(['a','b'])  // 1) Create a view onto the original map   
  9. assert abMap.size() == 2  
  10. abMap = myMap.findAll   { entry -> entry.value < 3}  
  11. assert abMap.size() == 2  
  12. assert abMap.a      == 1  
  13. def found = myMap.find  { entry -> entry.value < 2}  
  14. assert found.key   == 'a'  
  15. assert found.value == 1  
  16.   
  17. def doubled = myMap.collect { entry -> entry.value *= 2}  
  18. assert doubled instanceof List  
  19. assert doubled.every    {item -> item %2 == 0}  
  20.   
  21. def addTo = []  
  22. myMap.collect(addTo)    { entry -> entry.value *= 2}  
  23. assert addTo instanceof List  
  24. assert addTo.every      {item -> item %2 == 0}  
From the list of available methods that you have seen for other datatypes, you may miss our dearly beloved isCase for use with grep and switch . Don’t we want to classify with maps? Well, we need to be more specific: Do we want to classify by the keys or by the values? Either way, an appropriate isCase is available when working on the map’s keySet or values

Maps in action: 
Let's revisit our initial example of counting word frequencies in a text corpus. The strategy is to use a map with each distinct word serving as a key. The mapped value of that word is its frequency in the text corpus. We go through all words in the text and increase the frequency value of that respective word in the map. We need to make sure that we can increase the value when a word is hit the first time and there is no entry yet in the map. Luckily, the get(key,default) method does the job. 
- Listing 4.16 Counting word frequency with maps 
  1. def textCorpus =   
  2. """  
  3. Look for the bare necessities  
  4. The simple bare necessities  
  5. Forget about your worries and your strife  
  6. I mean the bare necessities  
  7. Old Mother Nature's recipes  
  8. That bring the bare necessities of life with bare interest  
  9. """  
  10. def words = textCorpus.tokenize()  
  11. def wordFrequency = [:]  
  12. words.each { word ->  
  13.     wordFrequency[word] = wordFrequency.get(word,0) + 1     // 1)  
  14. }  
  15. def wordList = wordFrequency.keySet().toList()  
  16. wordList.sort { wordFrequency[it] }                         // 2)  
  17. def statistic = "\n"  
  18. wordList[-1..-6].each { word ->  
  19.     statistic += word.padLeft(12)    + ': '  
  20.     statistic += wordFrequency[word] + "\n"  
  21. }  
  22.   
  23. assert statistic ==  
  24. """  
  25.         bare: 5  
  26. necessities: 4  
  27.          the: 3  
  28.         your: 2  
  29.     interest: 1  
  30.         with: 1  
  31. """  
For (2) part, Having the sort method on the wordList accept a closure turns out to be very beneficial, because it is able to implement its comparing logic on thewordFrequency map—on an object totally different from the wordList .

沒有留言:

張貼留言

網誌存檔

關於我自己

我的相片
Where there is a will, there is a way!