For the time being, assume that
you can work on a large string. You have numerous ways of splitting this string into words. But how do you count and store the word frequencies? You cannot have a distinct variable for each possible word you encounter. Finding a way of storing frequencies in a list is possible but inconvenient—more suitable for a brain teaser than for good code. Maps come to the rescue.
Some pseudocode to solve the problem could look like this:
- for each word {
- if (frequency of word is not known)
- frequency[word] = 0
- frequency[word] += 1
- }
Specifying maps:
The specification of maps is analogous to the list specification that you saw in the previous section. Just like lists, maps make use of the subscript operator to retrieve and assign values. The difference is that maps can use any arbitrary type as an argument to the subscript operator, where lists are bound to integer indexes. Whereas lists are aware of the sequence of their entries, maps are generally not. Specialized maps like java.util.TreeMap may have a sequence to their keys, though.
Simple maps are specified with square brackets around a sequence of items, delimited with commas. The key feature of maps is that the items are key-value pairs that are delimited by colons:
- def map = [key1:value1, key2:value2, key3:value3]
The character sequence [:] declares an empty map. Maps are by default of type java.util.HashMap and can also be declared explicitly by calling the respective constructor. The resulting map can still be used with the subscript operator. In fact, this works with any type of map, as you see in listing 4.11 with type java.util.TreeMap.
- Listing 4.11 Specifying maps
- def myMap = [a:1, b:2, c:3]
- assert myMap instanceof HashMap
- assert myMap.size() == 3
- assert myMap['a'] == 1
- def emptyMap = [:]
- assert emptyMap.size() == 0
- def explicitMap = new TreeMap()
- explicitMap.putAll(myMap)
- assert explicitMap['a'] == 1
- assert ['a':1] == [a:1]
This notation can also get in the way when, for example, the content of a local variable is used as a key. Suppose you have local variable x with content 'a' . Because [x:1]is equal to ['x':1] , how can you make it equal to ['a':1] ? The trick is that you can force Groovy to recognize a symbol as an expression by putting it inside parentheses:
- def x = 'a'
- assert ['x':1] == [x:1]
- assert ['a':1] == [(x):1]
Using map operators:
The simplest operations with maps are storing objects in the map with a key and retrieving them back using that key. Listing 4.12 demonstrates how to do that. One option for retrieving is using the subscript operator. As you have probably guessed, this is implemented with map’s getAt method. A second option is to use the key like a property with a simple dot-syntax. You will learn more about properties in chapter 7. A third option is the get method, which additionally allows you to pass a default value to be returned if the key is not yet in the map. If no default is given, null will be used as the default. If on a get(key,default) call the key is not found and the default is returned, the key:default pair is added to the map.
- Listing 4.12 Accessing maps (GDK map methods)
- def myMap = [a:1, b:2, c:3]
- // Retrieve existing elements
- assert myMap['a'] == 1
- assert myMap.a == 1
- assert myMap.get('a') == 1
- assert myMap.get('a',0) == 1
- // Attempt to retrieve missing elements
- assert myMap['d'] == null
- assert myMap.d == null
- assert myMap.get('d') == null
- // Supply a default value
- assert myMap.get('d',0) == 0
- assert myMap.d == 0
- // Simple assignments in the map
- myMap['d'] = 1
- assert myMap.d == 1
- myMap.d = 2
- assert myMap.d == 2
- myMap = ['a.b':1]
- assert myMap.'a.b' == 1
- Listing 4.13 Query methods on maps
- def myMap = [a:1, b:2, c:3]
- def other = [b:2, c:3, a:1]
- // Normal JDK methods
- assert myMap == other // Call to equals
- assert myMap.isEmpty() == false
- assert myMap.size() == 3
- assert myMap.containsKey('a')
- assert myMap.containsValue(1)
- assert myMap.keySet() == toSet(['a','b','c'])
- assert toSet(myMap.values()) == toSet([1,2,3])
- assert myMap.entrySet() instanceof Collection
- // 1) Methods added by GDK
- assert myMap.any {entry -> entry.value > 2 }
- assert myMap.every {entry -> entry.key < 'd'}
- // Utility method used for assertions
- def toSet(list){
- new java.util.HashSet(list)
- }
- Listing 4.14 Iterating over maps (GDK)
- def myMap = [a:1, b:2, c:3]
- // Iterate over entries
- def store = ''
- myMap.each {entry ->
- store += entry.key
- store += entry.value
- }
- assert store.contains('a1')
- assert store.contains('b2')
- assert store.contains('c3')
- // Iterate over keys/values
- store = ''
- myMap.each {key, value ->
- store += key
- store += value
- }
- assert store.contains('a1')
- assert store.contains('b2')
- assert store.contains('c3')
- // Iterate over just the keys
- store = ''
- for (key in myMap.keySet()) {
- store += key
- }
- assert store.contains('a')
- assert store.contains('b')
- assert store.contains('c')
- // Iterate over just the values
- store = ''
- for (value in myMap.values()) {
- store += value
- }
- assert store.contains('1')
- assert store.contains('2')
- assert store.contains('3')
- Listing 4.15 Changing map content and building new objects from it
- def myMap = [a:1, b:2, c:3]
- myMap.clear()
- assert myMap.isEmpty()
- myMap = [a:1, b:2, c:3]
- myMap.remove('a')
- assert myMap.size() == 2
- myMap = [a:1, b:2, c:3]
- def abMap = myMap.subMap(['a','b']) // 1) Create a view onto the original map
- assert abMap.size() == 2
- abMap = myMap.findAll { entry -> entry.value < 3}
- assert abMap.size() == 2
- assert abMap.a == 1
- def found = myMap.find { entry -> entry.value < 2}
- assert found.key == 'a'
- assert found.value == 1
- def doubled = myMap.collect { entry -> entry.value *= 2}
- assert doubled instanceof List
- assert doubled.every {item -> item %2 == 0}
- def addTo = []
- myMap.collect(addTo) { entry -> entry.value *= 2}
- assert addTo instanceof List
- assert addTo.every {item -> item %2 == 0}
Maps in action:
Let's revisit our initial example of counting word frequencies in a text corpus. The strategy is to use a map with each distinct word serving as a key. The mapped value of that word is its frequency in the text corpus. We go through all words in the text and increase the frequency value of that respective word in the map. We need to make sure that we can increase the value when a word is hit the first time and there is no entry yet in the map. Luckily, the get(key,default) method does the job.
- Listing 4.16 Counting word frequency with maps
- def textCorpus =
- """
- Look for the bare necessities
- The simple bare necessities
- Forget about your worries and your strife
- I mean the bare necessities
- Old Mother Nature's recipes
- That bring the bare necessities of life with bare interest
- """
- def words = textCorpus.tokenize()
- def wordFrequency = [:]
- words.each { word ->
- wordFrequency[word] = wordFrequency.get(word,0) + 1 // 1)
- }
- def wordList = wordFrequency.keySet().toList()
- wordList.sort { wordFrequency[it] } // 2)
- def statistic = "\n"
- wordList[-1..-6].each { word ->
- statistic += word.padLeft(12) + ': '
- statistic += wordFrequency[word] + "\n"
- }
- assert statistic ==
- """
- bare: 5
- necessities: 4
- the: 3
- your: 2
- interest: 1
- with: 1
- """
沒有留言:
張貼留言