FERPA Disclaimer Statement
IMPORTANT NOTE: You are solely responsible for the maintenance of the confidentiality of this information as established in the Family Educational Rights and Privacy Act (FERPA) and Michigan State University Guidelines Governing Privacy and Release of Student Records.
Accessing MSUEDW Data using R
While the Oracle SQL Developer is a nice application for exploring
the available data, and dBeaver
Community is even better, why use one application to access and
download and another to analyze the data? The following R
code examples and snippets provide a collection of common MSUEDW data
pulls and merges done entirely in R
. We believe off-loading
filters and merges to a locol machine reduces the load on the SQL
server. NB: We commonly – and encourage – the use dBeaver of
explore the various Views/Tables within the EDW to
explore the contained data and determine the fields of interest.
Additionally, we use dBeaver to determine fields to restrict data
pulls.
The following documentation is segmented into sections based on specific actions.
Download and Install the Oracle Database JDBC driver and Companion Jars Downloads
Visit the Oracle Database JDBC driver and Companion Jars Downloads page and download the Zipped JDBC driver (ojdbc11.jar) and Companion Jars file.
The following instructions are for installation on macOS (version 11 or newer). Similar steps should work on Linux and Windows.
Command Line Install
Within your home
directory (aka folder) create a
software
directory. Move the
ojdbc11-full.tar.gz
file into the software
and
unzip the file. The easiest way to accomplish this is via the macOS
Terminal. When downloading the ojdbc11-full.tar.gz
file you
were likely given the option where to save it. Please save it to the
Downloads
directory.
emilio@iMac ~ % mkdir -p software
emilio@iMac ~ % cd software
emilio@iMac software % tar zxvf ~/Downloads/ojdbc11-full.tar.gz
emilio@iMac software % cd ojdbc11-full
emilio@iMac ojdbc11-full % ls
Javadoc-Readmes ojdbc.policy ojdbc11dms_g.jar osdt_cert.jar ucp11.jar
LICENSE.txt ojdbc11.jar ons.jar osdt_core.jar xdb.jar
README.txt ojdbc11_g.jar oraclepki.jar rsi.jar xmlparserv2.jar
dms.jar ojdbc11dms.jar orai18n.jar simplefan.jar xmlparserv2_sans_jaxp_services.jar
Install Java- and SQL-specific R
Packages
The following packages are needed to access the MSUEDW SQL server.
They only need to be installed once, but if you upgrade your
R
instance, you might need to reinstall these packages. The
DBI
,
rJava
,
and RJDBC
packages allow R
to connect to the Oracle database, while
keyring
safely stores your password. The tidyverse
is a
collection of R
packages designed to streamline and make
data analysis easier.
Open RStudio and enter the following command in the Console.
install.packages("DBI", "rJava", "RJDBC", "keyring", "tidyverse")
Save Your Password in Your Secure Keyring
The keyring::key_set()
function prompts the user to enter their password. The ability to have
the R
session interact with macOS
Keychain keyring, Windows
Credential Store keyring or Linux
Secret Service keyring to obtain your password from a secure
keyring. For additional information about keyrings on specific operating
systems, please see the following:
- Apple article on What is Keychain Access on Mac?
- Microsoft article on Accessing Credential Manager
- Linux manual page entry on keyrings
Open RStudio and enter the following command in the Console. The following function stores your password in secure keyring of your operating system. Securely storing you password allows you to automate data pulls and script actions without including a password in the script.
keyring::key_set(service="MSUEDW", username="USERNAME")
After submitting the command, a password submission box opens. Enter your MSUEDW password and the “OK” button.
The keyring::key_set()
function creates a MSUEDW entry
within the keyring, in this case for macOS.
Exploring the entry, you can see the service name
(MSUEDW
), account name (USERNAME
), and the
password (1235567890
). The Access Control enables the need
to notice require the keychain (or keyring) password to access the
password for any R
session.
You can also retrieve the password using the
keyring::key_get()
command in the RStudio
Console:
The R
Script
The following describes the components of the R
script
used to interact with and download data from the MSUEDW. The script is
segmented into components to facilitate easier explanations.
Set Java Parameters
The first line of the R
file instructs Java to clean up
after itself and use a maximum amount of user specified memory (aka
RAM).
- Concurrent Mark Sweep (CMS) Collector for “shorter garbage collection pauses.”
- Set the maximum memory heap size
Typically, we set the only the maximum memory heap size and set it at
a value of ~ 8 Gb; -Xmx8192m
or -Xmx8g
. If
your computer has more memory, feel free to increase the value. See the
-Xmx
documentation for additional information. NB: From the
-Xmx
documentation, “-Xmx
does not limit the
total amount of memory that the JVM can use.”
Load Required R
Packages
Next, we load the packages needed to query the SQL server and filter and merge the results.
Create Connection Driver
Create the JDBC connection drive using the classPath
determined above.
jdbcDriver <- RJDBC::JDBC(driverClass="oracle.jdbc.OracleDriver",
classPath="/Users/emilio/software/ojdbc11-full/ojdbc11.jar")
Create Connection to MSUEDW
Create connection details for MSUEDW (details are found in the
tnsnames.ora
file). Remember to replace
"USERNAME"
with your username in
both places. Also, the general connection
format is included along with MSUEDW specific function.
# Connection format is:
# DBI::dbConnect(jdbcDriver,
# "jdbc:oracle:thin:@//database.hostname.com:port/service_name_or_sid",
# "username", "password")
MSUEDW.jdbc <- DBI::dbConnect(jdbcDriver,
"jdbc:oracle:thin:@//ebsprd390d.ais.msu.edu:1521/MSUEDW.EBSP.MSU.EDU",
"USERNAME",
keyring::key_get("MSUEDW","USERNAME"))
Construct SQL Query
Construct the SQL query to be passed to the SQL server. The query can
be basic or complex. This example is very basic and selects the first 10
lines from the siscs.c_class_tbl_v
table (view). We gave
the variable a very general name (sql.query
), but using a
more specific name is advisable when making multiple data pulls. For
example, a possible name for this query is sql.ClassTable
where the sql
indicates the variable contains an SQL query
and ClassTable
indicate the query is to pull class table
data.
sql.query <- "select *
from siscs.c_class_tbl_v
where rownum <= 10"
Send Query and Fetch Results
These two command sends the SQL query to the indicated SQL server
(dbSendQuery()
) and then fetches (aka downloads) the data
(dbFetch()
). The query.data
variable contains
the data resulting from your SQL query. Potentially better variable
names include ClassTable.results
and
ClassTable.data
, respectively.
query.results <- dbSendQuery(MSUEDW.jdbc, sql.query)
query.data <- dbFetch(query.results)
Save the Data
Only save the data while excluding the query.results
. It
is prudent to retain the sql.query
because it is possible
that when reopening the data file, you do not have access to the
original R
script used to pull the data. This is common if
you pull the data for a colleague or vice versa. Additionally,
saving the query.results
will cause errors when reading in
the data file if the DBI
package is not loaded.
Close the JDBC Connection
Remember to clean up when done by closing the database connection.
DBI::dbDisconnect(MSUEDW.jdbc)