Wednesday, June 23, 2021

Jupyter Notenook(Beginner's Guide)

Jupyter Notebook


In the early days, scientists used to have a lab notebook to test progress results and conclusions. Jupyter is a modern-day notebook that allows data scientists to record the complete analysis process. Much in the same way other scientists use a lab notebook.
 

Jupyter Notebook



The Jupyter product was originally developed as a part of the IPython project. The iPython project was used to provide interactive online access to python. IPython is still an active tool and it is still available for use. The name Jupyter itself is derived from the combination of Julia, Python, and R.

It is strongly recommended to install Python and Jupyter using Anaconda distribution, which includes python. Especially, when you don't know much about other commonly used packages for scientific computing as well as for data science.

Although one can also do the same using the pip installation method personally. But, it is suggested to use Anaconda Navigator, which is a desktop graphical user interface included in Anaconda. Now, this allows you to launch applications and easily manage the Conda packages environment. It even works without the need to use command line commands to Anaconda Navigators.

Installation

The easiest way for a beginner to get started with Jupyter Notebooks are by installing Anaconda.

Anaconda is the most widely used Python distribution for data science and comes pre-loaded with all the most popular libraries and tools. Some of the biggest Python libraries included in Anaconda include NumPy, pandas, and Matplotlib, though the full 1000+ list is exhaustive.

Anaconda thus lets you hit the ground running with a fully stocked machine learning and data science workshop without the hassle of managing countless installations or worrying about dependencies and OS-specific installation issues.

To get Anaconda, simply: 

  • Download the latest version of Anaconda.
  • Install Anaconda by following the instructions on the download page and/or in the executable.

 

Once you download the Anaconda Navigator, it looks something like this.

Jupyter Navigator 



Anaconda Navigator has Jupyter notebook, Qt console (which is IPython console), It has Spider IDE which is similar to an R Studio in terms of Python, it has Jupyter Lab, Orange3, Glue, and VS Code.

Jupyter Notebook is fundamentally a JSON file with a number of annotations. It has three main parts which are the metadata, Notebook format, and the list of cells.

Jupyter user interface has a number of components. It's important to know what components you should be using on a daily basis and you should get familiar with them.

Once Jupyter notebook is launched it looks like this.

Jupyter notebook



It creates an online python instance for you to use over the web. We have a Jupyter label on the top left. This acts as a button to go to your home page whenever it is clicked.

There is a dashboard having three tabs displayed. Files, running, and clusters.


jupyter tabs




The file tab shows the list of the current files in the directory.

The running tab presents another screen of the currently running processes and the notebooks.

The cluster tab presents another screen to display the list of clusters available.

In the top right corner of the screen, there are three buttons which are the upload, new, and the refresh button.

python 3

 

The upload button is used to add files to the Notebook space. You may also just drag and drop as you would when handling files.

Similarly, you can drag and drop notebooks into specific folders as well.

The menu with the new in the top further has text file, Folder, terminal, and Python 3 as options.

The text file option is used to add a text file to the current directory.

Jupyter will open a new browser window for you for running a new text editor. The text entered in this file is automatically saved and will be displayed in your notebook files display. The folder option creates a new folder with the name Untitled folder. All the files and folder names are editable. The terminal option allows you to start an IPython session. The notebook option available will be activated when additional notebooks are available in your environment.

The Python 3 option is used to begin Pythons session interactively in your notebook. The interface looks like the following screenshot.

python 3 ide



You have is full file editing capabilities for your scripts including saving as a new file.

You also have a complete working IDE for your python script.

The refresh button is used to update the display. It's not really necessary as a display is reactive to any changes in the underlying file structure.

There is also a check box drop-down menu and a home button.

jupyter notebook


The check box is used to toggle all the checkboxes in the item list. You can select all of these when either move or either delete all of the files selected. You can select all and deselect some of the files as your wish. The drop-down menu presents a list of choices available, which are the folders all notebooks running, and the files to the folder section.


Jupyter Notebook Cells


Cells in Jupyter notebook are of four types − Code, Markdown, raw nbconvert and heading.

 

Code Cells

Contents in this cell are treated as statements in a programming language of the current kernel. The default kernel is Python. So, we can write Python statements in a code cell. When such a cell is run, its result is displayed in an output cell. The output may be text, image, matplotlib plots or HTML tables. Code cells have rich text capability.

 

Markdown Cells

These cells contain text formatted using markdown language. All kinds of formatting features are available like making text bold and italic, displaying ordered or unordered lists, rendering tabular contents, etc. Markdown cells are especially useful to provide documentation to the computational process of the notebook.

 

Raw nbconvert Cells

Contents in raw cells are not evaluated by notebook kernel. When passed through nbconvert, they will be rendered as desired. If you type LatEx in a raw cell, rendering will happen after nbconvert is applied.

  

Heading Cells


Contents in these cells are used to mark the different levels of headings.


jupyter cells




Jupyter Notebook Security


Jupyter Notebooks are created in order to be shared with other users. In many cases over the Internet. However, Jupyter notebook can execute arbitrary code and generate arbitrary code. This can be a problem. If malicious aspects have been placed in the note. The default security mechanism for notebooks includes raw HTML, which is always sanitized and checked for malicious coding.

Another aspect is that you cannot run external Java scripts. Now the cell contents, especially the HTML and the JavaScript are not trusted. It requires user evaluation to continue and the output from any cell is not trusted. HTML or JavaScript is never trusted and clearing the output will cause the notebook to become trusted when save now notebooks can also use a security digest to ensure the correct user is modifying the contents.