Source From Here
Here we explore how to define a data set in an R session. Only two commands are explored. The first is for simple assignment of data, and the second is for reading in a data file. There are many ways to read data into an R session, but we focus on just two to keep it simple.
The most straight forward way to store a list of numbers is through an assignment using the c command. (c stands for “combine.”) The idea is that a list of numbers is stored under a given name, and the name is used to refer to the data. A list is specified with the c command, and assignment is specified with the “<-” symbols. Another term used to describe the list of numbers is to call it a “vector.”
The numbers within the c command are separated by commas. As an example, we can create a new variable, called “bubba” which will contain the numbers 3, 5, 7, and 9:
If you wish to work with one of the numbers you can get access to it using the variable and then square brackets indicating which number:
Notice that the first entry is referred to as the number 1 entry, and the zero entry can be used to indicate how the computer will treat the data. You can store strings using both single and double quotes, and you can store real numbers. You now have a list of numbers and are ready to explore. In the chapters that follow we will examine the basic operations in R that will allow you to do some of the analyses required in class.
1.2. Reading a CSV file
Unfortunately, it is rare to have just a few data points that you do not mind typing in at the prompt. It is much more common to have a lot of data points with complicated relationships. Here we will examine how to read a data set from a file using the read.csv function but first discuss the format of a data file.
We assume that the data file is in the format called “comma separated values” (csv). That is, each line contains a row of values which can be numbers or letters, and each value is separated by a comma. We also assume that the very first row contains a list of labels. The idea is that the labels in the top row are used to refer to the different columns of values.
First we read a very short, somewhat silly, data file. The data file is called simple.csv and has three columns of data and six rows. The three columns are labeled “trial,” “mass,” and “velocity.” We can pretend that each row comes from an observation during one of two trials labeled “A” and “B.” A copy of the data file is shown below and is created in defiance of Werner Heisenberg:
The command to read the data file is read.csv. We have to give the command at least one arguments, but we will give three different arguments to indicate how the command can be used in different situations. The first argument is the name of file. The second argument indicates whether or not the first row is a set of labels. The third argument indicates that there is a comma between each number of each line. The following command will read in the data and assign it to a variable called “heisenberg:”
To get more information on the different options available you can use the help command:
If R is not finding the file you are trying to read then it may be looking in the wrong folder/directory. If you are using the graphical interface you can change the working directory from the file menu. If you are not sure what files are in the current working directory you can use the dir() command to list the files and the getwd() command to determine the current working directory:
The variable “heisenberg” contains the three columns of data. Each column is assigned a name based on the header (the first line in the file). You can now access each individual column using a “$” to separate the two names:
If you are not sure what columns are contained in the variable you can use the names command:
We will look at another example which is used throughout this tutorial. we will look at the data found in a spreadsheet located at http://cdiac.ornl.gov/ftp/ndp061a/trees91.wk1 . A description of the data file is located athttp://cdiac.ornl.gov/ftp/ndp061a/ndp061a.txt . The original data is given in an excel spreadsheet. It has been converted into a csv file, trees91.csv , by deleting the top set of rows and saving it as a “csv” file. This is an option to save within excel. (You should save the file on your computer.) It is a good idea to open this file in a spreadsheet and look at it. This will help you make sense of how R stores the data.
The data is used to indicate an estimate of biomass of ponderosa pine in a study performed by Dale W. Johnson, J. Timothy Ball, and Roger F. Walker who are associated with the Biological Sciences Center, Desert Research Institute, P.O. Box 60220, Reno, NV 89506 and the Environmental and Resource Sciences College of Agriculture, University of Nevada, Reno, NV 89512. The data is consists of 54 lines, and each line represents an observation. Each observation includes measurements and markers for 28 different measurements of a given tree. For example, the first number in each row is a number, either 1, 2, 3, or 4, which signifies a different level of exposure to carbon dioxide. The sixth number in every row is an estimate of the biomass of the stems of a tree. Note that the very first line in the file is a list of labels used for the different columns of data.
This will create a new variable called “tree.” If you type in “tree” at the prompt and hit enter, all of the numbers stored in the variable will be printed out. Try this, and you should see that it is difficult to make any sense out of the numbers.
There are many different ways to keep track of data in R. When you use the read.csv command R uses a specific kind of variable called a “data frame.” All of the data are stored within the data frame as separate columns. If you are not sure what kind of variable you have then you can use the attributes command. This will list all of the things that R uses to describe the variable:
The first thing that R stores is a list of names which refer to each column of the data. For example, the first column is called “C”, the second column is called “N.” Tree is of type data.frame. Finally, the rows are numbered consecutively from 1 to 54. Each column has 54 numbers in it. If you know that a variable is a data frame but are not sure what labels are used to refer to the different columns you can use the names command:
If you want to work with the data in one of the columns you give the name of the data frame, a “$” sign, and the label assigned to the column. For example, the first column in tree can be called using “tree$C:”
1.3. Brief Note on Fixed Width Files
There are many ways to read data using R. We only give two examples, direct assignment and reading csv files. However, another way deserves a brief mention. It is common to come across data that is organized in flat files and delimited at preset locations on each line. This is often called a “fixed width file.” The command to deal with these kind of files is read.fwf. Examples of how to use this command are not explored here, but a brief example is given. If you would like more information on how to use this command enter the following command:
The read.fwf command requires at least two options. The first is the name of the file and the second is a list of numbers that gives the length of each column in the data file. A negative number in the list indicates that the column should be skipped. Here we give the command to read the data file fixedWidth.dat . In this data file there are three columns. The first colum is 17 characters wide, the second column is 15 characters wide, and the last column is 7 characters wide. In the example below we use the optional col.names option to specify the names of the columns:
* R Script - Introduction and Practice -1
* R Script - Advanced Data Processing and Visualization
* LAB EXERCISE - ANALYZE FACEBOOK FAN PAGE
- [ 英文學習 ]
- [ 計算機概論 ]
- [ 深入雲計算 ]
- [ 雜七雜八 ]
- [ Algorithm in Java ]
- [ Data Structures with Java ]
- [ IR Class ]
- [ Java 文章收集 ]
- [ Java 代碼範本 ]
- [ Java 套件 ]
- [ JVM 應用 ]
- [ LFD Note ]
- [ MangoDB ]
- [ Math CC ]
- [ MongoDB ]
- [ MySQL 小學堂 ]
- [ Python 考題 ]
- [ Python 常見問題 ]
- [ Python 範例代碼 ]
- [C 常見考題]
- [C 範例代碼]
- [C/C++ 範例代碼]
- [Intro Alg]
- [Java 代碼範本]
- [Java 套件]
- [Linux 小技巧]
- [Linux 小學堂]
- [Linux 命令]
- [ML In Action]
- [Python 學習筆記]
- [Quick Python]
- [Software Engineering]
- [The python tutorial]
- ActiveMQ In Action
- Big Data 研究
- Design Pattern
- Device Driver Programming
- Docker 工具
- Docker Practice
- English Writing
- ExtJS 3.x
- Git Pro
- Hadoop. Hadoop Ecosystem
- Java Framework
- Java UI
- Learn Spark
- ML Udemy
- node js
- Python Std Library
- Python tools
- Ruby Packages
- Windows 技巧
Source From Here Preface Do you know the difference between the following syntax? view plain copy to clipboard print ? [x for ...
Preface: 在這個階層中，我們只需考慮電路模組的功能，而不需考慮其硬體的詳細內容. Verilog 的時序控制為以事件為基礎的時序控制: * 接線或暫存器的值被改變。 * 模組的輸入埠接收到新的值 * 正規...
來源自 這裡 前言 : Thread 是 threading 模塊中最重要的類之一，可以使用它來創建線程。有兩種方式來創建線程：一種是通過繼承Thread 類，重寫它的 run 方法；另一種是創建一個 threading.Thread 對象，在它的初始化...
Source From Here Preface 本文實例講述了Python中 zip() 函數的定義及用法，相信對於Python初學者有一定的借鑒價值。 一、定義： zip([iterable, ...]) zip() 是 Python 的一個內建...