Automating Custom Log Ingestion into Microsoft Sentinel with Azure DevOps

Automating Custom Log Ingestion into Microsoft Sentinel with Azure DevOps (Part 1)

Recently, one of my clients had an incident and I was approached to help operations team analyse a large data set . This was gigs and gigs of raw log files download off the file system of the assets investigated. Since, this log source was not onboarded to Sentinel we were unable to readily parse this. If you’ve worked with various SIEM platforms, you might be familiar with simple upload buttons for ad-hoc data ingestion. Microsoft Sentinel takes a different approach. It has been designed and built around APIs and automation, which actually opens up more flexibility once you understand the capabilities. I realized this was an opportunity to leverage Sentinel’s Log Ingestion API. So I built a solution that’s now reusable across my team. In this write up, I’ll walk through how to ingest custom logs into Sentinel using the Log Ingestion API,and automate this process leveraging Azure Devops pipeline.

Use Case

During an incident, I was handed several large Apache log files. These logs weren’t coming from a live source or connector they were historical, and needed quick parsing and analysis. Sentinel does not offer a direct upload method using clickops. So the best available option was to use the log ingestion API and post the data to Sentinel LAW. Since this process involves some degree of engineering and its not feasible for every analyst who might have to upload similar files in future,I decided to build a pipeline that could handle such scenarios in the future allowing any member of the team to upload data without having to build the tooling or applications.

I will go over high level steps in this post

Step 1: Create Data Collection Endpoint (DCE)

Before we create a table to store our data start we will create a Data Collection Endpoint (DCE).

To do this:

Navigate to Azure portal
In search bar type ‘dce’ and click ‘Data collection endpoints’
Fill in the details (name ,resource group region) and click Create

Once created go to Overview and click JSON view and locate the value of logingestion endpoint and take a note. This will be required to make the api call.

Before we proceed further there are two important bits to consider namely Table Schema and Log format for upload.

Table Schema

A table’s schema is the set of columns that make up the table, into which Azure Monitor Logs collects log data from one or more data sources.

Any logs that you put in LAW must have a field called ‘TimeGenerated’ this is an absolute must. In Apache access logs, there are no field value pairs hence there is no field called ‘TimeGenerated’. To get around this I used simple regex with help of python to extract the timestamp and assign it a field name called ‘TimeGenerated’. I also used the same script to create following fields:

TimeGenerated
ip
method
status
referrer
user-agent
protocol
other

Log format before upload

Before logs can be uploaded to Sentinel, they must be converted into JSON format, so using same script (referenced below) and leveraging ‘json’ module in python,we can convert the access logs to JSON

Step 2: Create Custom Table and DCR

A table provides location to store data while the DCR defines how data is data collected and transformed.

To create a new table and DCR:

Navigate to Azure portal
Bring up Sentinel Log Analytics Workspace and click Tables in the left pane.
Click Create “New custom log (Direct ingest)
Fill in the name field, leave ‘Analytics’ selected.
Click create new Data Collection Rule next to ‘Data Collection Rule’ dropdown. Fill out the details in the flyout pane and click done.
Choose previously created DCE in the next drop down ‘Data Collection Endpoint’
Next you will be presented with a screen to upload sample data in JSON format.(Use the provided script to convert raw apache logs into JSON) Browse to a json file or drag and drop the file into the browser.
Click Next and click Create.

Below is a python script that can be used to convert apache logs to JSON format. (This version references location in azure devops build directory,change them as needed to run it directly on your device’s file system)

        
      
import os
import re
import json
from datetime import datetime, timezone

# Input log file
input_file = "apache.log"

# Output path using Azure Pipelines env var
artifact_dir = os.environ.get("BUILD_ARTIFACTSTAGINGDIRECTORY", ".")
output_file = os.path.join(artifact_dir, "parsed_apache_logs.json")

# Ensure output directory exists
os.makedirs(artifact_dir, exist_ok=True)

# Apache log regex
log_pattern = re.compile(
    r'(?P<ip>\S+) - - \[(?P<timestamp>[^\]]+)\] '
    r'"(?P<method>\S+) (?P<url>\S+) (?P<protocol>[^\"]+)" '
    r'(?P<status>\d+) (?P<size>\d+) '
    r'"(?P<referrer>[^\"]*)" "(?P<user_agent>[^\"]*)" "(?P<other>[^\"]*)"'
)

# Parse logs
parsed_logs = []
with open(input_file, "r") as f:
    for line in f:
        match = log_pattern.match(line.strip())
        if match:
            log_entry = match.groupdict()

            # Convert timestamp to ISO UTC for ingestion
            try:
                dt = datetime.strptime(log_entry["timestamp"], "%d/%b/%Y:%H:%M:%S %z")
                log_entry["TimeGenerated"] = dt.astimezone(timezone.utc).isoformat()
            except ValueError:
                log_entry["TimeGenerated"] = None  # Fallback if parsing fails

            parsed_logs.append(log_entry)

# Save JSON
with open(output_file, "w") as f:
    json.dump(parsed_logs, f, indent=4)

print(f" Parsed {len(parsed_logs)} log entries into {output_file}")

In upcoming blogs we will go over steps required to make API calls to send these JSON formatted logs to LAW using Log Ingestion API and then how to automate the entire process using Azure Devops Pipelines.

(To be continued)

Automating Custom Log Ingestion into Microsoft Sentinel with Azure DevOps