Skip to main content
This guide provides a comprehensive reference for developers creating filters to extract, enrich, and transform raw log data in UTMStack v11. Filters are YAML files used by the parsing plugin to convert raw events into a standardized format.
Developer Reference: This page is designed as a practical guide for implementing data transformation pipelines through filters.

What are Filters?

Filters define how to extract and transform data from raw events into a standardized format that can be:
  • Analyzed by correlation rules
  • Searched in Log Explorer
  • Visualized in dashboards
  • Stored efficiently

Purpose

  • Parse raw log formats (JSON, CSV, key-value, free text)
  • Extract relevant fields from unstructured data
  • Normalize field names across data sources
  • Enrich data with additional context
  • Transform data types for proper analysis

Filter Structure

pipeline:
  - dataTypes:              # Event types this filter applies to
      - apache
    steps:                  # Processing steps
      - json:               # Step 1: Parse JSON
          source: raw
      - rename:             # Step 2: Rename fields
          from:
            - log.host.ip
          to: origin.ip
      # Additional steps...
See complete documentation for all available filter steps and detailed examples. View Full Filter Implementation Guide →

Filter Steps Reference

Parsing Steps

StepPurposeUse Case
jsonParse JSON dataStructured logs from applications
grokPattern-based parsingUnstructured text logs (Apache, Syslog)
kvKey-value pair parsingSimple formatted logs
csvCSV data parsingComma-separated log formats

Transformation Steps

StepPurposeUse Case
renameRename fieldsStandardize field names
castConvert data typesEnsure proper types for analysis
reformatReformat valuesTimestamp conversion, string formatting
trimRemove charactersClean up parsed data

Enrichment Steps

StepPurposeUse Case
addAdd new fieldsAdd metadata, computed values
dynamicCall external pluginsGeolocation, threat intelligence
expandExpand nested dataFlatten complex structures

Cleanup Steps

StepPurposeUse Case
deleteRemove fieldsRemove unnecessary data

Quick Start Example

Here’s a complete filter for Apache access logs:
pipeline:
  - dataTypes:
      - apache
    steps:
      # 1. Parse JSON container
      - json:
          source: raw

      # 2. Extract IP using grok
      - grok:
          patterns:
            - fieldName: origin.ip
              pattern: '{{.ipv4}}|{{.ipv6}}'
            - fieldName: deviceTime
              pattern: '\[{{.data}}\]'
            - fieldName: log.statusCode
              pattern: '{{.integer}}'
          source: log.message

      # 3. Convert to proper types
      - cast:
          fields:
            - log.statusCode
          to: int

      # 4. Add geolocation
      - dynamic:
          plugin: com.utmstack.geolocation
          params:
            source: origin.ip
            destination: origin.geolocation
          where: exists(origin.ip)

      # 5. Normalize action
      - add:
          function: 'string'
          params:
            key: action
            value: 'get'
          where: safe(log.method, "") == "GET"

      # 6. Clean up
      - delete:
          fields:
            - raw
            - log.message
          where: exists(action)

Development Workflow

1

Identify Data Source

Determine what log source you need to process
2

Analyze Raw Format

Examine sample raw events to understand structure
3

Create Filter File

Start with basic parsing steps
4

Add Transformation

Normalize fields and data types
5

Enrich Data

Add geolocation, classifications
6

Test Filter

Deploy and test with sample data
7

Optimize

Remove unnecessary fields, improve performance

Best Practices

Standardize Field Names
  • Use consistent naming across all filters
  • Follow UTMStack field mapping conventions
  • Common fields: origin.ip, target.ip, deviceTime, action, actionResult
Remove Unnecessary Data
  • Delete fields not needed for analysis
  • Reduces storage requirements
  • Improves query performance
Handle Missing Data
  • Use conditional steps with where clauses
  • Test with incomplete/malformed data
  • Provide sensible defaults
Optimize Performance
  • Apply heavy operations conditionally
  • Use efficient parsing methods
  • Delete unnecessary fields early in pipeline
Document Filters
  • Comment complex patterns
  • Explain transformation logic
  • Note data source requirements

Common Patterns

Pattern 1: Web Server Logs

steps:
  - grok:
      patterns:
        - fieldName: origin.ip
          pattern: '{{.ipv4}}'
        - fieldName: log.method
          pattern: '{{.word}}'
        - fieldName: origin.path
          pattern: '{{.data}}'
        - fieldName: log.statusCode
          pattern: '{{.integer}}'
      source: log.message
  - cast:
      fields: [log.statusCode]
      to: int
  - add:
      function: 'string'
      params:
        key: actionResult
        value: 'success'
      where: safe(log.statusCode, 0) >= 200 && safe(log.statusCode, 0) < 300

Pattern 2: Syslog Parsing

steps:
  - grok:
      patterns:
        - fieldName: deviceTime
          pattern: '{{.monthName}}\s+{{.monthDay}}\s+{{.time}}'
        - fieldName: origin.host
          pattern: '{{.word}}'
        - fieldName: log.program
          pattern: '{{.word}}'
        - fieldName: log.message
          pattern: '{{.greedy}}'
      source: raw
  - reformat:
      fields: [deviceTime]
      function: time
      fromFormat: 'Jan 02 15:04:05'
      toFormat: '2006-01-02T15:04:05Z'

Pattern 3: JSON with Nested Data

steps:
  - json:
      source: raw
  - expand:
      source: log.metadata
      to: log.expandedMetadata
      where: exists(log.metadata)
  - rename:
      from: [log.expandedMetadata.userId]
      to: origin.user
  - delete:
      fields: [log.metadata]
      where: exists(log.expandedMetadata)

Troubleshooting

Filter Not Processing

Check: Event has correct dataType field matching filter configuration

Fields Not Extracted

Check: Field names in grok patterns match exactly, patterns are correct

Type Conversion Errors

Check: Field exists before casting, target type is appropriate

Performance Issues

Check: Remove unnecessary fields early, use conditional steps, optimize grok patterns