Archive for Data Science

The seq and rep functions in R

R has two simple, yet very useful functions to easily create vectors that respect certain conditions. The operator “:” allows you to create vectors within a specified range. Thus:

creates a vector with integers between 1 and 5 (inclusive).  To create a sequence of even integers between 0 and 10, for example, you can use the seq() function as follows:

The by parameter is the count the function uses to determine the next number in the sequence. By default this is 1.

The rep() function is used to create vectors according to a repeated pattern. The following are some examples of the rep function:

Hope that you find this tutorial useful!

Combining Vectors into a Table in R

Imagine I have two vectors, age and income, and I want to group them into a table in R. To do this, use the cbind() or the rbind() functions to bind the vectors by column or by row respectively. The parameter deparse.level  determines how labels are constructed. If 0 is used, no labels are constructed. If 1 or 2 are used, then the labels are built from the argument names.

Example

Create two vectors using the sample function:

To merge the vectors into one table, use:

The following image shows how the combined vectors would look like:

Hope you found this article useful!

Transforming a Vector in R

Suppose I have a Vector, with 0 and 1 where 0 means a low credit rating and 1 meaning a high credit rating. I want to get a Vector with the string representations of the data (i.e. low and high).

This is one way how it can be done. The original vector was created using the sample function, as follows:

Then I will use the sapply function and the ifelse function to transform my data:

The result is as follows.

Hope you’ll find this example useful in your work!

Creating a list of random integers in R

To create a list of random integers in R, you can use the  sample function. For example, imagine you want to create a vector of random ages between 18 and 65. This would be done as follows:

This would create a vector as shown in the following image.

Hope that you will find this sample useful!

Importing Stata data in R

To import Stata data in R, you must first install the foreign package:

Load the package using the library function:

To read the data, use the command:

Use the View command to view the data:

Hope you’ll find this information useful!

Simulating data using the Gamma distribution

To simulate fake data in R, based on the normal distribution, use the rgamma() function. This function is defined as follows:

where n is the number of observations (greater than 1) and shape is a parameter that affects the shape of the distribution.
As an example, the following

will produce a list of 1000 observations with a shape parameter (which must be positive) equal to 1.
The image below shows the first few entries of x2:
A histogram plot will produce the following:
Hope that you’ll find this example useful.

Plotting a histogram in R

If you want to plot a histogram of some data that you have, you can use the hist() function in R. The Redwoods article in the link has a very good basic introduction to histograms in R.

If you have simulated some data, as in the linked article, you can plot the data using the command:
This command will produce a plo similar to the following:
Hope that you have found this post, and the link provided, useful!

Simutating data in R using a normal distribution

To simulate fake data in R, based on the normal distribution, use the rnorm() function. This function is defined as follows:

where n is the number of observations (greater than 1), mean is a vector of means and sd is a vector of standard deviations.
As an example, the following

will produce a list of 1000 observations with a mean of 5 and a standard deviation of 7 and stores them in x1.
The image below shows the first few entries of x1:
« Older Entries