- This topic has 0 replies, 1 voice, and was last updated 10 months, 3 weeks ago by Simileoluwa.
- February 22, 2020 at 7:40 pm #86484Participant@simileoluwa
According to a report from IBM, in 2015 there were 2.35 million openings for Data Analytics jobs in the US. It estimates that the number will rise to 2.72 million by 2020. Data Science has been described as a career of the future and thus a large number of people are making a switch of professions from different fields of study to Data Science. The demand for professionals in this field keeps growing and employers are constantly in search of individuals with the expertise to manage, analyze and safely store ever-larger sets of data.
The fast-growing tendency of Data Science has led to the development of two major open-source tools, of which specializing experts are expected to have a very strong proficiency in at least one of the tools. These tools are called R programming and Python. There is a lot of heated discussion over the subject of which tool is the best or most preferred for Data Scientists. Some suggest Python is preferable as a general-purpose programming language, while others suggest Data Science is better served by a dedicated language and toolchain.
Both have their pros and cons which includes:
- Python is great for Machine Learning frameworks and has a large library dedicated to Data Science, it is however not as strong as R in terms of Statistical Analysis which is essential to Data Science. R programming, on the other hand, is least preferred in terms of Machine Learning but has a large library dedicated to Data Science also.
- R has become the world’s largest repository of statistical knowledge with reference implementations for thousands, if not tens of thousands, of algorithms that have been vetted by experts.
- Python has an easier learning curve when compared to R which can be more complicated to newbies.
- Python is a general-purpose programming language that can pretty much do anything you need it to data munging, data engineering, data wrangling, website scraping, web app building, and more. R, on the other hand, can not be used for certain things such as data engineering. While web apps can be built using shiny in R, it is very limited compared to Python which is a multi-paradigm language.
- While Python has great visualization potentials using the Matplotlib, R is generally more robust and preferred as the ggplot package is widely used, having great visualization potentials.
We could go on and on comparing these two, however, since both have their strengths and weaknesses, why can’t we just harness the power of the two as a team rather than pitching one against the other. The focus on “R or Python?” risks missing the advantages that having both can bring to individual Data Scientists and Data Science teams, as Hadley Wickham stated in an interview:
Generally, there are a lot of people who talk about R versus Python like it’s a war that either R or Python is going to win. I think that is not helpful because it is not a battle. These things exist independently and are both awesome in different ways.
Embedding Python Scripts in R Markdown
Recognizing the extent harnessing both tools can go, professionals are developing ways of using both simultaneously to complement each other’s weaknesses and this is exactly what the reticulate package in R aims to achieve. In this tutorial, we will see how to embed Python scripts in an R Markdown file.
The reticulate package provides a comprehensive set of tools for interoperability between Python and R. The package includes facilities for calling Python from R in a variety of ways including R Markdown, sourcing Python scripts, importing Python modules, and using Python interactively within an R session.
To initiate this procedure, we begin by first opening an RMarkdown file in the R studio on your local machine:
Having set up the Markdown file, we proceed to install the reticulate package and then set up the python engine as shown below:
Once the process has been executed you can then move to write Python and R scripts in R Markdown file, however, you must ensure to indicate at the beginning of each code chunk what kind language you will be writing:
As you can see from the image above that the kind of language to be written was specified at the setup section of the code chunk and if one wants to write an R script in another chunk, all you need to do is to alter the setup to specify R:
It can be seen from the image above that the setup section of the code chunk was altered to specify R as the language to be written.
The last procedure to take note of in the article is how to access your objects created in a Python code chunk in R code chunk. When you save variables in a Python code chunk, you must start the variable name with r.variable_name, this is so that objects can be saved in the R memory. Thus, the saved objects can be easily accessed since it already exists in the memory recognized by R, you can see this in the image below:
If you want to access any variable in the R environment, always ensure to include the r.variable_name.
We have assessed the strengths of the two leading Data Science Tools and how to harness the power of both using an R Markdown File. It should be noted that the reticulate package is still under development and not fully developed, however, you can get almost all of your processes executed.
- You must be logged in to reply to this topic.