Source From HerePrefaceThere are many methods in the
String class (
you don't have to memorize them all; you can look up the documentation) like the
reverse that gives a backwards version of a string (
reverse does not change the original string).
length that tells us the number of characters (
including spaces) in the string.
upcase changes every lowercase letter to uppercase, and
downcase changes every uppercase letter to lowercase.
swapcase switches the case of every letter in the string, and finally,
capitalize is just like
downcase, except that it switches the first character to uppercase (
if it is a letter),
slice gives you a substring of a larger string.
The methods
upcase,
downcase,
swapcase and
capitalize have corresponding methods that modify a string in place rather than creating a new one:
upcase!,
downcase!,
swapcase! and
capitalize!. Assuming you don't need the original string, these methods will save memory, especially if the string is large.
Listing all methods of a class or objectShows you a list of methods that the
Class object
String responds to.
>> String.methods.sort
=> [:!, :!=, :!~, :<, :<=, :<=>, :==, :===, :=~, :>, :>=, :__id__, :__send__, :allocate, :ancestors, :autoload, :autoload?, :class, :class_eval, :class_exec, :class_variable_defined?, :class_variable_get, :class_variable_set, :class_variables, :clone, :com, :const_defined?, ...]
This method tells you all the instance methods that instances of String are endowed with.
>> String.instance_methods.sort
=> [:!, :!=, :!~, :%, :*, :+, :<, :<<, :<=, :<=>, :==, :===, :=~, :>, :>=, :[], :[]=, :__id__, :__send__, :ascii_only?, :between?, :bytes, :bytesize, :byteslice, :capitalize, :capitalize!, :casecmp, :center, :chars, ...]
With this method, you can view a class's instance methods without those of the class's ancestors.
>> String.instance_methods(false).sort
=> [:%, :*, :+, :<, :<<, :<=, :<=>, :==, :===, :=~, :>, :>=, :[], :[]=, :ascii_only?, :bytes, :bytesize, :byteslice, :capitalize, :capitalize!, :casecmp, :center, :chars, :chomp, :chomp!, :chop, ...]
Comparing two strings for equalityStrings have several methods for testing equality. The most common one is
== (
double equals sign). Another equality-test instance method,
String.eql?, tests two strings for identical content. It returns the same result as
==. A third instance method,
String.equal?, tests whether two strings are the same object. An example
p013strcmp.rbillustrates this:
- # p013strcmp.rb
- # String#eql?, tests two strings for identical content.
- # It returns the same result as ==
- # String#equal?, tests whether two strings are the same object
- s1 = 'Jonathan'
- s2 = 'Jonathan'
- s3 = s1
- if s1 == s2
- puts 'Both Strings have identical content'
- else
- puts 'Both Strings do not have identical content'
- end
- if s1.eql?(s2)
- puts 'Both Strings have identical content'
- else
- puts 'Both Strings do not have identical content'
- end
- if s1.equal?(s2)
- puts 'Two Strings are identical objects'
- else
- puts 'Two Strings are not identical objects'
- end
- if s1.equal?(s3)
- puts 'Two Strings are identical objects'
- else
- puts 'Two Strings are not identical objects'
- end
Using %wSometimes creating arrays of words can be a pain, what with all the quotes and commas. Fortunately, Ruby has a shortcut:
%w does just what we want.
>> names1 = ['john', 'ken', 'mary']
=> ["john", "ken", "mary"]
>> puts names1[0]
john
=> nil
>> puts names1[2]
mary
=> nil
>> names2 = %w{ john ken mary}
=> ["john", "ken", "mary"]
>> puts names2[1]
ken
=> nil
Character SetA character set, or more specifically, a coded character set is a set of character symbols, each of which has a unique numerical ID, which is called the character's code point.
An example of a character set is the 128-character ASCII character set, which is mostly made up of the letters, numbers, and punctuation used in the English language. The most expansive character set in common use is the Universal Character Set (UCS), as defined in the Unicode standard, which contains over 1.1 million code points.
The letter A, for example, is assigned a magic number by the Unicode consortium which is written like this: U+0041. A string "Hello" which, in Unicode, corresponds to these five code points:
U+0048 U+0065 U+006C U+006C U+006F
Just a bunch of code points. Numbers, really. We haven't yet said anything about how to store this in memory. That's where encodings come in.
Character EncodingUTF-8 can be used for storing your string of Unicode code points, those magic U+ numbers, in memory using 8 bit bytes. In UTF-8, every code point from 0-127 is stored in a single byte. Only code points 128 and above are stored using 2, 3, in fact, up to 6 bytes. This has the neat side effect that English text looks exactly the same in UTF-8 as it did in ASCII.
It does not make sense to have a string without knowing what encoding it uses. Thus, if you have a string, you have to know what encoding it is in or you cannot interpret it or display it to users correctly. Ruby supports the idea of character encodings.
Encoding classObjects of class Encoding each represent a different character encoding. The Encoding.list method returns a list of the built-in encodings.
>> Encoding.list
=> [#, #, #, #, ... ]
Ruby has a way of setting the encoding on a file-by-file basis using a new magic comment. If the first line of a file is a comment (
or the second line if the first line is a #! shebang line), Ruby scans it looking for the string
coding:. If it finds it, Ruby then skips any spaces and looks for the (
case-insensitive) name of an encoding. Thus, to specify that a source file is in UTF-8 encoding, you can write this:
As Ruby is just scanning for
coding:, you could also write the following:
Supplement*
[ Ruby Gossip ] Basic : 內建型態與操作 - 字串型態*
Stackoverflow - how to convert character encoding with ruby 1.9
>> s = "Learn Objective\xE2\x80\x93C on the Mac"=> "Learn Objective–C on the Mac">> s.encoding=> #<Encoding:UTF-8>>> s=> "Learn Objective–C on the Mac">> s.force_encoding "ASCII-8BIT" # force_encoding(encoding): Changes the encoding to encoding and returns self.=> "Learn Objective\xE2\x80\x93C on the Mac"