Threat Hunting with Jupyter Notebooks— Part 1: Your First Notebook 📓
When it comes to threat detection, how many times have you heard someone say “It is all in my head, just ask me if you have any questions!” or “Only he/she/they know(s) how to do it!” Plenty of times, right? Not documenting, standardizing or sharing how to analyze data to detect potential intrusions in a network is more common than you think, especially when the team is very diverse from a technical and expertise perspective. It does not only affect your detection strategies but also the dynamics of your team.
Now, how many times have you also thought about a more efficient, intuitive or creative way to analyze the security events your organization collects, but you feel limited to the capabilities of a one language-dependent search bar?
This post is part of a five-part series which will introduce the concept of utilizing Jupyter Notebooks for a more dynamic, flexible and language-agnostic way to analyze security events, and at the same time help your team document, standardize and share detection playbooks. Something you can integrate with projects like the ThreatHunter-Playbook, and deploy it at home for free and for unlimited time with open source projects like HELK.
In this first post, I will go over the basics of how Jupyter Notebooks work, how to create your first notebook and how to run some initial basic commands in Python.
The other four parts can be found in the following links:
- Threat Hunting with Jupyter Notebooks — Part 2: Basic Data Analysis with Pandas 📊
- Threat Hunting with Jupyter Notebooks — Part3 Querying Elasticsearch via Apache Spark ✨
- Threat Hunting with Jupyter Notebooks — Part 4: SQL JOIN via Apache SparkSQL 🔗
- Threat Hunting with Jupyter Notebooks — Part 5: Documenting, Sharing and Running Threat Hunter Playbooks! 🏹
What is a Notebook?
Think of a notebook as a document that you can access via a web interface that allows you to save input (i.e. live code) and output (i.e. code execution results / evaluated code output) of interactive sessions as well as important notes needed to explain the methodology and steps taken to perform specific tasks (i.e data analysis).
What is a Jupyter Notebook?
The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, and much more
The Jupyter Notebook project is the evolution of the IPython Notebook library which was developed primarily to enhance the default python interactive console by enabling scientific operations and advanced data analytics capabilities via sharable web documents.
Nowadays, the Jupyter Notebook project not only supports Python but also over 40 programming languages such as R, Julia, Scala and PySpark. In fact, its name was originally derived from three programming languages: Julia, Python and R which made it one of the first language-agnostic notebook applications, and now considered one of the most preferred environments for data scientists and engineers in the community to explore and analyze data.
How do Jupyter Notebooks work?
Jupyter Notebooks work with what is called a two-process model based on a kernel-client infrastructure. This model applies a similar concept to the Read-Evaluate-Print Loop (REPL) programming environment that takes a single user’s inputs, evaluates them, and returns the result to the user.
Based on the two-process model concept, we can explain the main components of Jupyter in the following way:
Jupyter Client
- It allows a user to send code to the kernel in a form of a Qt Console or a browser via notebook documents.
- From a REPL perspective, the client does the
read
andprint
operations. - Notebooks are hosted by a Jupyter web server which uses Tornado to serve HTTP requests.
Jupyter Kernel
- It receives the code sent by the client, executes it, and returns the results back to the client for display. A kernel process can have multiple clients communicating with it which is why this model is also referred as the decoupled two-process model.
- From a REPL perspective, the kernel does the
evaluate
operation. - kernel and clients communicate via an interactive computing protocol based on an asynchronous messaging library named ZeroMQ (low-level transport layer) and WebSockets (TCP-based)
Jupyter Notebook Document
- Notebooks are automatically saved and stored on disk in the open source JavaScript Object Notation (JSON) format and with a
.ipynb
extension.
Jupyter Lab
- It is the next-generation web-based user interface for Project Jupyter.
- JupyterLab will eventually replace the classic Jupyter Notebook.
- We will use the Jupyter Lab extension throughout the whole series.
Installing Jupyter
I am sure you are anxious to install Jupyter and start exploring its capabilities, but first you have to decide if you want to install the Jupyter Notebook server directly on your system or host it on a virtual machine or a docker container.
I believe it is important to give you the options so that you feel comfortable running the tool however you feel like it. If you want to do a classic install directly on your system, follow the official Jupyter Install documents.
Running Jupyter via Docker
For this series, we are going to use the HELK project . I prefer to share a standardized and working environment via docker images to focus more on the capabilities of the application rather than spend time troubleshooting the server installation.
Requirements
Initial Steps
- Clone the latest
HELK
repository, change your current directory toHELK/docker
and run thehelk_install.sh
script as sudo.
git clone https://github.com/Cyb3rWard0g/HELK.git
cd HELK/docker
sudo ./helk_install.sh
- For this series we are going to use option
3
with abasic
license
- Once the script finishes (around 5–10 mins), you will be presented with a similar output:
- Browse to the
HELK JUPYTER SERVER URL
- You might get a
Privacy Error
. Just ignore it, click onAdvanced
and thenProceed
.
COPY
theJUPYTER CURRENT TOKEN
value showing in your console andPASTE
theTOKEN
in thePassword or token
box, and clickLog in
.
- You will get access to the main Jupyter Lab menu.
- As you can see in the image above, there is a folder available named
datasets
. That contains a default Mordor dataset that we will be using in the next posts. Also, there are a few notebooks available for you. I made all of those for you to get familiarized with the concepts I will be sharing.
Exploring Jupyter’s Main Interface
Files Browser Section
- This section shows available objects such as folders or files that are available for you to use. You can
rename
,move
,download
, ordelete
any folders or files by right-clicking on the objects.
- You can also upload a file that exists locally in your system by clicking on the
Upload
icon as shown below.
- In addition, you can also create new objects such as
Notebooks
, text files, and even run a bash terminal.
- You can do it also from the launcher section as shown below
- As you can see in the image above, our Jupyter server has four kernels available: Python 3, PySpark, R, and Syplon.
- By default, Jupyter comes with the
Python 3 (IPython)
kernel. The Jupyter team maintains the IPython kernel since the Jupyter notebook server depends on the IPython kernel functionality. Many other languages, in addition to Python, may be used in the notebook. This is what I meant with a language-agnostic approach. If you want to read more about the other kernels, you can do it here.
Running Terminals and Kernels Section
- This section provides information about current running Jupyter processes. When you create a notebook or start a terminal session, you will be able to track those processes in here.
Commands Section
- This section provides information about several commands that are available to interact with files, new consoles. kernels, etc.
Your First Notebook!
Let’s create our first notebook and get familiarized with additional options.
- Go back to the Launcher section, and click on the specific Kernel you want to use to initialize a new notebook. Let’s pick
Python 3
.
- You will get to the basic notebook web interface with an initial notebook named
Untitled
and oneinput(In)
cell available where you can add code for thePython 3
kernel to execute. - You will also see a new file named
untitled
with the extensionipynb
. Notebooks are saved automatically.
- You can minimize the File Browser section by clicking on its icon
- You can rename your notebook by right-clicking on the title of the notebook and clicking on
Rename Notebook
.
- If you check the Jupyter kernel sessions, you will see your notebook running with the option for you to shutdown the kernel.
Exploring the Notebook interface
There are several options available in the notebook interface, and most of them are very straight-forward and work in a similar way to what you get in a regular document tool-bar where you can save, open or close files. However, there are a few options and concepts that are very important to understand:
The Cell Environment
- 1️⃣ One of the main parts of the cell environment is the
input cell container
where you can type code (i.e Python) or text (i.e Markdown) for the kernel to evaluate and execute. - 2️⃣ There is an
input label
to the left of theinput cell
that keeps track of the execution of code via a sequence of numbers starting from[1]
. If nothing has been run yet, it will show empty brackets[ ]
. If the kernel is running code, it displays an*
inside of the brackets[]
. - 3️⃣ You can save the contents of a notebook, add new cells (➕) , cut/delete a cell (✂)️ , copy a cell, and paste cells.
- 4️⃣ You can run code by selecting the
input cell container
and clicking on therun cell button
. You can also run a cell withSHIFT + ENTER
. - 5️⃣ You can select the type of input you will be working with via the
code
drop-down button. There are currently three cell input types. Thecode
input type allows the user to run code in the specific programming language defined in the kernel. In this case Python 3.
You can also access more options via the Run
tab as shown below:
The Kernel Environment
- You can access Kernel options via the
Kernel
tab.
I want to run some code!
Now that we understand how to interact with the Jupyter Notebooks interface, let’s run some basic python code in the input cell container
.
- Type a basic python PRINT statement, and run the cell (
SHIFT + ENTER
)
print("Hola World!!")
for x in range(5):
print(x)
- Switch the new
cell type
toMarkdown
to enter some markdown
COPY
,PASTE
, andRUN
the following in theMarkdown
cell:
# Threat Detection
## Data Analysis
### Data Sources
#### Process Monitoring
PowerShell Execution
- Select all the current cells with
SHIFT + UP
and delete them.
- Make a variable available across multiple input cells by creating a variable on one cell and running it first before calling for it from another cell
dog_name = 'Pedro'
print(dog_name + " is my best friend!")
One helpful feature over the standard python shell is tab completion.
- You can type the first letters of the variable
dog_name
and press the tab key on your keyboard. This will search the namespace for any variables matching the letters that you have typed so far.
dog_<tab>
- You can also tab complete methods or attributes available in an object.
- Let’s define a list with elements about my dog and save it in a variable
dog_list = ['pedro, 4, 2015]
- You can then type
dog_list
with a period after it and press the tab key to see the methods you can use against that list. For example, you can append a new element to the list.
- You can do the same for modules. Let’s import the module random and tab complete the methods or functions available with it.
- The tab completion feature also works for file paths. We can test it against our folder
datasets
.
Another cool feature is Introspection, and it is used to get information about an object (i.e list, functions, etc). You can simply type a question mark (?) before or after an object.
- Let’s use it against our variable
dog_list
(list)
- What about a function? Let’s test it against the
print
function. Providing one question mark (?) prints thedocstring
of the function.
That was very easy, right? If this was your first time using Jupyter Notebooks, I hope this helped you to get familiarized with the basic concepts and expedite the deployment of your first Jupyter environment !
If you want to get the Jupyter token again, you can do it with the following:
sudo docker exec -ti helk-jupyter jupyter notebook list | grep "token" | sed 's/.*token=\([^ ]*\).*/\1/'
In the next post, we will use a few of the available notebooks in the HELK jupyter container to learn a little bit more about data analysis of security event logs with a python library named Pandas.
References
https://jupyter4edu.github.io/jupyter-edu-book/
https://jupyter.readthedocs.io/en/latest/architecture/how_jupyter_ipython_work.html
https://ipython-books.github.io/chapter-3-mastering-the-jupyter-notebook/
https://en.wikipedia.org/wiki/Read%E2%80%93eval%E2%80%93print_loop
https://jupyterlab.readthedocs.io/en/stable/getting_started/overview.html#overview