Threat Hunting with Jupyter Notebooks— Part 1: Your First Notebook 📓

Published in

Posts By SpecterOps Team Members

12 min readMay 30, 2019

When it comes to threat detection, how many times have you heard someone say “It is all in my head, just ask me if you have any questions!” or “Only he/she/they know(s) how to do it!” Plenty of times, right? Not documenting, standardizing or sharing how to analyze data to detect potential intrusions in a network is more common than you think, especially when the team is very diverse from a technical and expertise perspective. It does not only affect your detection strategies but also the dynamics of your team.

Now, how many times have you also thought about a more efficient, intuitive or creative way to analyze the security events your organization collects, but you feel limited to the capabilities of a one language-dependent search bar?

This post is part of a five-part series which will introduce the concept of utilizing Jupyter Notebooks for a more dynamic, flexible and language-agnostic way to analyze security events, and at the same time help your team document, standardize and share detection playbooks. Something you can integrate with projects like the ThreatHunter-Playbook, and deploy it at home for free and for unlimited time with open source projects like HELK.

In this first post, I will go over the basics of how Jupyter Notebooks work, how to create your first notebook and how to run some initial basic commands in Python.

The other four parts can be found in the following links:

Threat Hunting with Jupyter Notebooks — Part 2: Basic Data Analysis with Pandas 📊
Threat Hunting with Jupyter Notebooks — Part3 Querying Elasticsearch via Apache Spark ✨
Threat Hunting with Jupyter Notebooks — Part 4: SQL JOIN via Apache SparkSQL 🔗
Threat Hunting with Jupyter Notebooks — Part 5: Documenting, Sharing and Running Threat Hunter Playbooks! 🏹

What is a Notebook?

Think of a notebook as a document that you can access via a web interface that allows you to save input (i.e. live code) and output (i.e. code execution results / evaluated code output) of interactive sessions as well as important notes needed to explain the methodology and steps taken to perform specific tasks (i.e data analysis).

What is a Jupyter Notebook?

The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, and much more

The Jupyter Notebook project is the evolution of the IPython Notebook library which was developed primarily to enhance the default python interactive console by enabling scientific operations and advanced data analytics capabilities via sharable web documents.

Nowadays, the Jupyter Notebook project not only supports Python but also over 40 programming languages such as R, Julia, Scala and PySpark. In fact, its name was originally derived from three programming languages: Julia, Python and R which made it one of the first language-agnostic notebook applications, and now considered one of the most preferred environments for data scientists and engineers in the community to explore and analyze data.

How do Jupyter Notebooks work?

Jupyter Notebooks work with what is called a two-process model based on a kernel-client infrastructure. This model applies a similar concept to the Read-Evaluate-Print Loop (REPL) programming environment that takes a single user’s inputs, evaluates them, and returns the result to the user.

Based on the two-process model concept, we can explain the main components of Jupyter in the following way:

Jupyter Client

It allows a user to send code to the kernel in a form of a Qt Console or a browser via notebook documents.
From a REPL perspective, the client does the read and print operations.
Notebooks are hosted by a Jupyter web server which uses Tornado to serve HTTP requests.

Jupyter Kernel

It receives the code sent by the client, executes it, and returns the results back to the client for display. A kernel process can have multiple clients communicating with it which is why this model is also referred as the decoupled two-process model.
From a REPL perspective, the kernel does the evaluate operation.
kernel and clients communicate via an interactive computing protocol based on an asynchronous messaging library named ZeroMQ (low-level transport layer) and WebSockets (TCP-based)

Jupyter Notebook Document

Notebooks are automatically saved and stored on disk in the open source JavaScript Object Notation (JSON) format and with a .ipynb extension.

Jupyter Lab

It is the next-generation web-based user interface for Project Jupyter.
JupyterLab will eventually replace the classic Jupyter Notebook.
We will use the Jupyter Lab extension throughout the whole series.

Installing Jupyter

I am sure you are anxious to install Jupyter and start exploring its capabilities, but first you have to decide if you want to install the Jupyter Notebook server directly on your system or host it on a virtual machine or a docker container.

I believe it is important to give you the options so that you feel comfortable running the tool however you feel like it. If you want to do a classic install directly on your system, follow the official Jupyter Install documents.

Running Jupyter via Docker

For this series, we are going to use the HELK project . I prefer to share a standardized and working environment via docker images to focus more on the capabilities of the application rather than spend time troubleshooting the server installation.

Requirements

HELK project

Initial Steps

Clone the latest HELK repository, change your current directory to HELK/docker and run the helk_install.sh script as sudo.

git clone https://github.com/Cyb3rWard0g/HELK.git
cd HELK/docker
sudo ./helk_install.sh

For this series we are going to use option 3 with a basic license

Once the script finishes (around 5–10 mins), you will be presented with a similar output:

Browse to the HELK JUPYTER SERVER URL
You might get a Privacy Error. Just ignore it, click on Advanced and then Proceed.

COPY the JUPYTER CURRENT TOKEN value showing in your console and PASTE the TOKEN in the Password or token box, and click Log in.

You will get access to the main Jupyter Lab menu.

As you can see in the image above, there is a folder available named datasets. That contains a default Mordor dataset that we will be using in the next posts. Also, there are a few notebooks available for you. I made all of those for you to get familiarized with the concepts I will be sharing.

Exploring Jupyter’s Main Interface

Files Browser Section

This section shows available objects such as folders or files that are available for you to use. You can rename , move , download, or delete any folders or files by right-clicking on the objects.

You can also upload a file that exists locally in your system by clicking on the Upload icon as shown below.

In addition, you can also create new objects such as Notebooks, text files, and even run a bash terminal.

You can do it also from the launcher section as shown below

As you can see in the image above, our Jupyter server has four kernels available: Python 3, PySpark, R, and Syplon.
By default, Jupyter comes with the Python 3 (IPython) kernel. The Jupyter team maintains the IPython kernel since the Jupyter notebook server depends on the IPython kernel functionality. Many other languages, in addition to Python, may be used in the notebook. This is what I meant with a language-agnostic approach. If you want to read more about the other kernels, you can do it here.

Running Terminals and Kernels Section

This section provides information about current running Jupyter processes. When you create a notebook or start a terminal session, you will be able to track those processes in here.

Commands Section

This section provides information about several commands that are available to interact with files, new consoles. kernels, etc.

Your First Notebook!

Let’s create our first notebook and get familiarized with additional options.

Go back to the Launcher section, and click on the specific Kernel you want to use to initialize a new notebook. Let’s pick Python 3 .

You will get to the basic notebook web interface with an initial notebook namedUntitled and one input(In) cell available where you can add code for the Python 3kernel to execute.
You will also see a new file named untitled with the extension ipynb . Notebooks are saved automatically.

You can minimize the File Browser section by clicking on its icon

You can rename your notebook by right-clicking on the title of the notebook and clicking on Rename Notebook.

If you check the Jupyter kernel sessions, you will see your notebook running with the option for you to shutdown the kernel.

Exploring the Notebook interface

There are several options available in the notebook interface, and most of them are very straight-forward and work in a similar way to what you get in a regular document tool-bar where you can save, open or close files. However, there are a few options and concepts that are very important to understand:

The Cell Environment

1️⃣ One of the main parts of the cell environment is the input cell container where you can type code (i.e Python) or text (i.e Markdown) for the kernel to evaluate and execute.
2️⃣ There is an input label to the left of the input cell that keeps track of the execution of code via a sequence of numbers starting from [1]. If nothing has been run yet, it will show empty brackets[ ]. If the kernel is running code, it displays an * inside of the brackets[].
3️⃣ You can save the contents of a notebook, add new cells (➕) , cut/delete a cell (✂)️ , copy a cell, and paste cells.
4️⃣ You can run code by selecting the input cell container and clicking on the run cell button . You can also run a cell with SHIFT + ENTER.
5️⃣ You can select the type of input you will be working with via the code drop-down button. There are currently three cell input types. The code input type allows the user to run code in the specific programming language defined in the kernel. In this case Python 3.

You can also access more options via the Run tab as shown below:

The Kernel Environment

You can access Kernel options via the Kernel tab.

I want to run some code!

Now that we understand how to interact with the Jupyter Notebooks interface, let’s run some basic python code in the input cell container.

Type a basic python PRINT statement, and run the cell (SHIFT + ENTER)

print("Hola World!!")

Try to run a FOR loop over a sequence of 5 number generated by RANGE

for x in range(5):
    print(x)

Switch the new cell type to Markdown to enter some markdown

COPY, PASTE, and RUN the following in the Markdown cell:

# Threat Detection 
## Data Analysis 
### Data Sources
#### Process Monitoring
PowerShell Execution

Select all the current cells with SHIFT + UP and delete them.

Make a variable available across multiple input cells by creating a variable on one cell and running it first before calling for it from another cell

dog_name = 'Pedro'

print(dog_name + " is my best friend!")

One helpful feature over the standard python shell is tab completion.

You can type the first letters of the variable dog_name and press the tab key on your keyboard. This will search the namespace for any variables matching the letters that you have typed so far.

dog_<tab>

You can also tab complete methods or attributes available in an object.
Let’s define a list with elements about my dog and save it in a variable

dog_list = ['pedro, 4, 2015]

You can then type dog_list with a period after it and press the tab key to see the methods you can use against that list. For example, you can append a new element to the list.

You can do the same for modules. Let’s import the module random and tab complete the methods or functions available with it.

The tab completion feature also works for file paths. We can test it against our folder datasets.

Another cool feature is Introspection, and it is used to get information about an object (i.e list, functions, etc). You can simply type a question mark (?) before or after an object.

Let’s use it against our variable dog_list (list)

What about a function? Let’s test it against the print function. Providing one question mark (?) prints the docstring of the function.

That was very easy, right? If this was your first time using Jupyter Notebooks, I hope this helped you to get familiarized with the basic concepts and expedite the deployment of your first Jupyter environment !

If you want to get the Jupyter token again, you can do it with the following:

sudo docker exec -ti helk-jupyter jupyter notebook list | grep "token" | sed 's/.*token=\([^ ]*\).*/\1/'

In the next post, we will use a few of the available notebooks in the HELK jupyter container to learn a little bit more about data analysis of security event logs with a python library named Pandas.