Sunteți pe pagina 1din 19

LogRhythm MPE Rule Builder Parsing

Guide
April 26, 2017 — Revision A

LogRhythm-MPE-RuleBuilderGuide-revA

© LogRhythm, Inc. All rights reserved


© LogRhythm, Inc. All rights reserved
This document contains proprietary and confidential information of LogRhythm, Inc., which is protected by
copyright and possible non-disclosure agreements. The Software described in this Guide is furnished under the
End User License Agreement or the applicable Terms and Conditions (“Agreement”) which governs the use of the
Software. This Software may be used or copied only in accordance with the Agreement. No part of this Guide may
be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying and
recording for any purpose other than what is permitted in the Agreement.
Disclaimer
The information contained in this document is subject to change without notice. LogRhythm, Inc. makes no
warranty of any kind with respect to this information. LogRhythm, Inc. specifically disclaims the implied warranty
of merchantability and fitness for a particular purpose. LogRhythm, Inc. shall not be liable for any direct, indirect,
incidental, consequential, or other damages alleged in connection with the furnishing or use of this information.
Trademark
LogRhythm is a registered trademark of LogRhythm, Inc. All other company or product names mentioned may be
trademarks, registered trademarks, or service marks of their respective holders.

LogRhythm Inc.
4780 Pearl East Circle
Boulder, CO 80301
(303) 413-8745
www.logrhythm.com
LogRhythm Customer Support
support@logrhythm.com

© LogRhythm, Inc. All rights reserved


Contents
Parsing Fields and Tags ...................................................................................................................................................................................... 4
The Application Tab ........................................................................................................................................................................................ 4
Kbytes/Packets Tab ......................................................................................................................................................................................... 7
Classification Tab.............................................................................................................................................................................................. 8
Host Tab............................................................................................................................................................................................................... 9
Identity Tab.......................................................................................................................................................................................................11
Location Tab .....................................................................................................................................................................................................11
Log Tab ..............................................................................................................................................................................................................12
Network Tab .....................................................................................................................................................................................................13
Special Sub-Rule Tags (Tag1-Tag5) ........................................................................................................................................................13
Best Practices and Working with Rules ......................................................................................................................................................14
Override Default Regex ...............................................................................................................................................................................14
Rule Names ......................................................................................................................................................................................................14
Common Event Names ................................................................................................................................................................................15
Regular Expression Characters and Practices ..........................................................................................................................................15
Match Characters ...........................................................................................................................................................................................15
Repetition Characters ...................................................................................................................................................................................16
Positional Characters ....................................................................................................................................................................................16
Grouping............................................................................................................................................................................................................16
The Non-Greedy Qualifier (?) ....................................................................................................................................................................17
Reserved Characters......................................................................................................................................................................................17
Other Special Characters .............................................................................................................................................................................18
Regex Recommended Practices ...............................................................................................................................................................19

© LogRhythm, Inc. All rights reserved


Parsing Fields and Tags
Using the Rule Builder, you can create custom parsing rules for your own log sources. The following tables provide
lists of all the metadata fields LogRhythm can parse, as well as their associated parsing tag(s) and default regex.
The fields are grouped by how they appear in the Web Console. If you do not see a field in the Web Console in
the same tab as this document, you may have tagged the field as a favorite, in which case the field will appear in
the Favorites tab instead of the main group tab as shown in this document.

NOTE: All mapping and parsing tags are lower case.

Fields denoted with † are available for parsing and investigations, and they are viewable in the Web
Console. These fields will be available in all product features in the LogRhythm 7.3 release.

The Application Tab


Display Field Description Tag(s) Default Regex
Application Application derived by IANA protocol N/A N/A
and port number or directly assigned
in MPE processing settings.
Object The resource (i.e., file) referenced or <object> \w+
impacted by activity reported in the
log.
Object Name The descriptive name of the object. <objectname> \w+
Do not use unless Object is also used.

Object Type † A category type for the object (e.g., <objecttype> \w+
file, image, pdf, etc.).

Hash † The hash value reported in the log. <hash> \w+


Choose MD5 > Sha1 > Sha256.

Policy † The specific policy referenced (i.e., <policy> \w+


Firewall, Proxy) in a log message.

Result † The outcome of a command <result> \w+


operation or action. For example, the
result of quarantine might be success.
URL The URL referenced or impacted by <url> https?://.+
activity reported in the log. You may
need to override the default regex for
URLs that are not HTTP/HTTPS.

User Agent † The User Agent string from web <useragent> \w+
server logs.

© LogRhythm, Inc. All rights reserved Page 4 of 19


Display Field Description Tag(s) Default Regex

Response Code † The explicit and well-defined <responsecode> \w+


response code for an action or
command captured in a log.
Response Code differs from Result in
that response code should be well-
structured and easily identifiable as a
code.
Subject The subject of an email or the general <subject> \w+
category of the log.
Version The software or hardware device <version> \w+
version described in either the
process or object.
Command The specific command executed that <command> \w+
has been recorded in the log
message.

Reason † The justification for an action or <reason> \w+


result when not an explicit policy.

Action † Field for "what was done" as <action> \w+


described in the log. Action is usually
a secondary function of a command
or process.

Status † The vendor's perspective on the state <status> \w+


of a system, process, or entity. Status
should NOT be used as the result of
an action.

Session Type † The type of session described in the <sessiontype> \w+


log (e.g., console, CLI, web). Unique
from IANA Protocol.
Process Name System or application process <process> \w+
described by the log message.
Process ID Numeric ID value for a process. <processid> \d+

Parent Process ID † The parent process ID of a system or <parentprocessid> \w+


application process that is of interest.
Parent Process The parent process name of a system <parentprocessname> \w+
† or application process.
Name

Parent Process The full path of a parent process of a <parentprocesspath> \w+


† system or application process.
Path

Quantity A numeric count of something. For <quantity> [0123456789\.]+


example, there are 4 lights (quantity
is 4).

© LogRhythm, Inc. All rights reserved Page 5 of 19


Display Field Description Tag(s) Default Regex
Amount The qualitative description of <amount> [0123456789\.]+
quantity (percentage or relative
numbers) For example, half the lights
are on (amount is .5 or 50). Amount is
also used for currency.
Size Numeric description of capacity (e.g., <size> [0123456789\.]+
disk size) without a specific unit of
measurement. Size is generally used
as a limit rather than a current
measurement. Use Amount for non-
specific measurements.
Rate Defines a number of something per <rate> [0123456789\.]+
unit of time without a specific unit of
measurement. Always expressed as a
fraction.
Duration The elapsed time reported in a log If log has start/end use: [0123456789\.]+
message, derived from multiple fields. (?<timestart>pattern)
Note: Time Start and
Timestart and Timeend need custom (?<timeend>pattern)
Time End tags must
parsing patterns.
If log has elapsed time use: be overloaded to
<days> function properly.
<hours>
<minutes>
<seconds>
<milliseconds>
<microseconds>
<nanoseconds>
Session Unique user or system session <session> \w+
identifier.
Known Application Application derived from IANA N/A N/A
protocol and port number. If a known
application cannot be derived, it is
displayed as unknown.

© LogRhythm, Inc. All rights reserved Page 6 of 19


Kbytes/Packets Tab
Display Field Description Tag(s) Default Regex
• Host (Impacted) The number of bytes sent Use the appropriate tags based [0123456789\.]+
KBytes Rcvd or received in the context upon the units and direction
• Host (Impacted) of the Impacted Host. represented by the log data:
KBytes Sent
• Rcvd – Bytes received <bitsin>, <bitsout>
• Host (Impacted)
by impacted host <bytesin>, <bytesout>
Kbytes Total
• Sent – Bytes sent by <kilobitsin>, <kilobitsout>
impacted host <kilobytesin>, <kilobytesout>
• Total – Total bytes in <megabitsin>, <megabitsout>
session as seen by <megabytein>, <megabyteout>
impacted host <gigabitsin>, <gigabitsout>
<gigabytein>, <gigabyteout>
<terabitsin>, <terabitsout>
<terabytesin>, <terabytesout>
<petabitsin>, <petabitsout>
<petabytesin>, <petabytesout>,
<bits>, <bytes>, <kilobits>,
<kilobytes>, <megabits>,
<megabytes>, <gigabits>,
<gigabytes>, <terabits>,
<terabytes>, <petabits>,
<petabytes>
• Host (Impacted) The number of packets <packetsin>, <packetsout>, [0123456789\.]+
Packets Rcvd sent or received in the <packets>
• Host (Impacted) context of the Impacted
Packets Sent Host.
• Host (Impacted)
• Rcvd – Packets
Packets Total
received by impacted
host
• Sent – Packets sent
by impacted host
• Total – Total packets
in session as seen by
impacted host

© LogRhythm, Inc. All rights reserved Page 7 of 19


Classification Tab
Display Field Description Tag(s) Default Regex
Classification Value is determined based on the MPE Rule’s N/A N/A
assigned Common Event.
Common Event Value is determined based on the MPE Rule’s N/A N/A
assigned Common Event
Priority Value is determined based on the Risk-Based- N/A N/A
Priority (RBP) calculation.
Direction Indicates the directional flow of data between the N/A N/A
Origin Host and the Impacted Host — Inbound,
Outbound, Internal, External, or Unknown.
Severity The vendor's view of the severity of the log. <severity> \w+
Vendor Message ID Specific vendor for the log used to describe a type <vmid> \w+
of event.

Vendor Info † Description of a specific vendor log or event <vendorinfo> \w+


identifier for the log. Human readable elaboration
that directly correlates to the VMID.
MPE Rule Name Name of rule that matched, assigned on rule N/A N/A
creation.

Threat Name † The name of a threat described in the log message <threatname> \w+
(e.g., malware, exploit name, signature name). Do
not overload with Policy.

Threat ID † ID number or unique identifier of a threat. Note <threatid> \w+


that CVE is stored separately.

CVE † CVE ID (i.e., CVE-1999-0003) from vulnerability scan <cve> \w+


data.

© LogRhythm, Inc. All rights reserved Page 8 of 19


Host Tab
Display Field Description Tag(s) Default Regex
Host (Origin) Origin host derived from Origin IP N/A N/A
Address and/or Origin Hostname.

Host (Impacted) Impacted host derived from N/A N/A


Impacted IP Address and/or
Impacted Hostname.

MAC Address The MAC address from which activity <smac> (\w{2}(:|-)?){6}
(Origin) originated (i.e., attacker, client).
MAC Address The MAC address that was affected <dmac> (\w{2}(:|-)?){6}
(Impacted) by the activity (i.e., target, server).

Interface (Origin) The network port/interface from <sinterface> \w+


which the activity originated (i.e.,
attacker, client).
Interface (Impacted) The network port/interface that was <dinterface> \w+
affected by the activity (i.e., target,
server).
IP Address (Origin) The IP address from which activity <sip> ((?<sipv4>(?<sipv4>1??(1
originated (i.e., attacker, client). ??\d{1,2}|2[0-4]\d|25[0-
(parses IPv4 and
5])\.(1??\d{1,2}|2[0-
IPv6)
4]\d|25[0-
5])\.(1??\d{1,2}|2[0-
4]\d|25[0-
5])\.(1??\d{1,2}|2[0-
4]\d|25[0-
5])))|(?<sipv6>(?<sipv6>1
??((?:(?:[0-9A-Fa-
f]{1,4}:){7}[0-9A-Fa-
f]{1,4}|(?=(?:[0-9A-Fa-
f]{1,4}:){0,7}[0-9A-Fa-
f]{1,4}\z)|(([0-9A-Fa-
f]{1,4}:){1,7}|:)((:[0-9A-Fa-
f]{1,4}){1,7}|:))))))

© LogRhythm, Inc. All rights reserved Page 9 of 19


Display Field Description Tag(s) Default Regex
IP Address The IP address that was affected by <dip> (parses IPv4 ((?<dipv4>(?<dipv4>1??(
(Impacted) the activity (i.e., target, server). and IPv6) 1??\d{1,2}|2[0-4]\d|25[0-
5])\.(1??\d{1,2}|2[0-
4]\d|25[0-
5])\.(1??\d{1,2}|2[0-
4]\d|25[0-
5])\.(1??\d{1,2}|2[0-
4]\d|25[0-
5])))|(?<dipv6>(?<dipv6>1
??((?:(?:[0-9A-Fa-
f]{1,4}:){7}[0-9A-Fa-
f]{1,4}|(?=(?:[0-9A-Fa-
f]{1,4}:){0,7}[0-9A-Fa-
f]{1,4}\z)|(([0-9A-Fa-
f]{1,4}:){1,7}|:)((:[0-9A-Fa-
f]{1,4}){1,7}|:))))))
NAT IP Address The Network Address Translated <snatip> Same as IP Origin (<sip>)
(Origin) (NAT) IP address from which activity
originated (i.e., attacker, client).
NAT IP Address The Network Address Translated <dnatip> Same as IP Impacted
(Impacted) (NAT) IP address that was affected by (<dip>)
the activity (i.e., target, server).
Hostname (Origin) The hostname from which activity <sname> (or DNS ([^\s\.]+\.?)+
originated (i.e., attacker, client). resolved from IP)
Hostname The hostname that was affected by <dname> (or DNS ([^\s\.]+\.?)+
(Impacted) the activity (i.e., target, server). resolved from IP)
Known Host (Origin) A value determined by mapping N/A N/A
parsed origin host identifiers, such as
IP address or hostname, to a
LogRhythm host record.
Known Host A value determined by mapping N/A N/A
(Impacted) parsed impacted host identifiers,
such as IP address or hostname, to a
LogRhythm host record.

Serial Number † The hardware or software serial <serialnumber> \w+


number in a log message. This value
should be a permanent unique
identifier.

© LogRhythm, Inc. All rights reserved Page 10 of 19


Identity Tab
Display Field Description Tag(s) Default Regex
User (Origin) The originating user or system account of the activity <login> \w+
reported in the log.
User (Impacted) The user or system account impacted by activity <account> \w+
reported in the log.
Sender The sender of an email or the "caller number" for a <sender> [^\s]+@[^\s]+
VOIP log. This value must relate to a specific user or
unique address in the case of a phone call or email.
Recipient The recipient of an email or the dialed number for a <recipient> [^\s]+@[^\s]+
VOIP log.
Group The user group or role impacted by activity reported in <group> \w+
the log. Do not use for entity group (zone or domain).

Location Tab
Display Field Description Tag(s) Default Regex
Entity (Origin) A value determined based on the origin host’s assigned N/A N/A
entity.
Entity (Impacted) A value determined based on the impacted host’s N/A N/A
assigned entity.
Zone (Origin) A value determined based on the zone of the origin host N/A N/A
— Internal, External, DMZ, or Unknown.
Zone (Impacted) A value determined based on the zone of the impacted N/A N/A
host — Internal, External, DMZ, or Unknown.
Location (Origin) A value determined by resolving the parsed origin IP N/A N/A
address against a Geo-IP database.
Location (Impacted) A value determined by resolving the parsed impacted IP N/A N/A
address against a Geo-IP database.
Country (Origin) The country in which the determined origin location exists. N/A N/A
Country (Impacted) The country in which the determined impacted location N/A N/A
exists.

© LogRhythm, Inc. All rights reserved Page 11 of 19


Log Tab
Display Field Description Tag(s) Default Regex
Log Date Timestamp when the log was generated or received, N/A N/A
corrected to UTC.
Log Count The number of identical log messages received. N/A N/A
Log Source Entity The entity to which the log source belongs. N/A N/A
Log Source Type The device or application type from which a log was N/A N/A
received.
Log Source Host The origin host from which the log was received. N/A N/A
Log Source The assigned name of a log source. N/A N/A
Log Sequence Number The sequence in which a log was collected, generated N/A N/A
by the Agent.
Log Message The raw log message. N/A N/A
First Log Date Timestamp when the first identical log message was N/A N/A
received.
Last Log Date Timestamp when the last identical log message was N/A N/A
received.

© LogRhythm, Inc. All rights reserved Page 12 of 19


Network Tab
Display Field Description Tag(s) Default Regex
Network (Origin) A value determined by mapping the N/A N/A
origin IP address to a LogRhythm
network record.
Network (Impacted) A value determined by mapping the N/A
impacted IP address to a LogRhythm
network record.

Domain (Impacted) † The Windows or DNS domain name <domain> \w+


referenced or impacted by activity
reported in the log.

Domain (Origin) † The Windows or DNS domain where <domainorigin> \w+


the logged activity originated.
Protocol The IANA protocol name or number. <protnum>, 1??\d{1,2}|2[0-4]\d|25[0-5]
<protname>

\w+
TCP/UDP Port (Origin) The port from which activity <sport> \d+
originated (i.e., client, attacker port).
TCP/UDP Port The port to which activity was <dport> \d+
(Impacted) targeted (i.e., server, target port).
NAT TCP/UDP Port The Network Address Translated <snatport> \d+
(Origin) (NAT) port from which activity
originated (i.e., client, attacker port).
NAT TCP/UDP Port The Network Address Translated <dnatport> \d+
(Impacted) (NAT) port to which activity was
targeted (i.e., server, target port).

Special Sub-Rule Tags (Tag1-Tag5)


Five additional tags are available for identifying data in the log specifically for sub-rules. These tags do not parse
text into metadata fields, they are only used to identify portions of the log message that should be used in the
development of sub-rules.
Tag Field Type Default Regex
<tag1> Text .*
<tag2> Text .*
<tag3> Text .*
<tag4> Text .*
<tag5> Text .*

© LogRhythm, Inc. All rights reserved Page 13 of 19


Best Practices and Working with Rules
This section contains several best practices for modifying MPE rules.

Override Default Regex


You can override the default regex if the source data does not conform to the default pattern. You only need to
override the default regex when the default:
• will not properly parse the correct data out of the log message.
• is not the optimal regex from a performance perspective.
To override the default regex, the following syntax should be used.
(?<[tagname]>[regex])
For example, suppose your regex needs to match file names with a specific extension such as the sample log
message below:
User john.doe opened AnnualReport.pdf
If the base-rule was written as:
User <login> opened <object>
The value parsed for login would john and the value for object would be AnnualReport. This is due to the fact that
a period is not a word character and the default regex of “\w+” would only match up to the period. Instead, the
default expressions should be overridden and the base-rule should be:
User (?<login>\w+\.?\w*) opened (?<object>\w+\.pdf)
Now, the base-rule will parse anything for login starting with a word character that optionally contains a period
followed be additional word characters.

Do not override/overload <sip>, <dip>, <snatip>, or <dnatip>

Rule Names
When naming a rule, follow these accepted best practices:
• When the matching log message contains a vendor message ID such as an event ID in Windows Event
Logs, it is good to include the ID in the name of the rule. This makes searching for the rule easier and also
makes the rule more descriptive of the log that it matches.
• If the rule matches a log from a logging system that generates logs for a wide variety of services, such as
the Windows Application Event Log, the service that generated the log message should be included in the
rule name.
• All rule names should contain a brief description of the action described by the log.
For example: EVID 528 : Failed Authentication : Bad Username or Password

© LogRhythm, Inc. All rights reserved Page 14 of 19


Common Event Names
Using the Rule Builder Common Event Browser, you can view the complete list of more than 40,000 common
events. Use the predefined common events wherever possible. If you need to create a new common event, use the
following guidelines:
• Common events should be generically named so that they can be re-used for a wide variety of devices.
For example, if a common event is being created for a log message that describes a successful connection
to an FTP server, the common event should be named so that the FTP server type is irrelevant.
o Good Name: FTP Connection Succeeded
o Bad Name: Gene6 FTP Connection Succeeded
• Common event names should always have the first letter of each word capitalized to make viewing
common events in analysis tools more consistent.

Regular Expression Characters and Practices


This section provides an overview of regular expression characters and recommend practices.

Match Characters
Notation Characters Matched Example
\d Any digit from 0 to 9 \d\d\d matches 101 but not 10a
\D Any character that is not a numeric digit (0 to 9) \D\D\D matches abc but not 101
\w Any word character, for example, a-z, A-Z, 0-9, \w\w\w matches abc but not &@#
and the underscore character _ (will also match
Unicode based word characters from non-Latin
alphabets and scripts)
\W Any non-word character \W\W\W matches $#! but not abc
\s Matches any whitespace character \s\s\s matches (three spaces) but not abc
\S Matches any non-whitespace character \S\S\S matches a1_ but not (three spaces)
. Matches any character . matches any character except line breaks
[] Any character between the square brackets [abc] matches a or b or c but no other character
[^ ] Matches any character except the characters [^abc] matches def but not abc
appearing after the ^ and before the ]

© LogRhythm, Inc. All rights reserved Page 15 of 19


Repetition Characters
Notation Characters Matched Example
{n} Matches n of the previous item \w{4} matches AAAA but not A
{n, } Matches n or more of the previous item \w{4, } matches AAAAAA but not A
{n,m} Matches at least n and at most m of the A{2,3} matches AA and AAA but not A or AAAA
previous item if n is 0 that makes the
character optional ({,9})
? Matches the previous item 0 or 1 times A? matches A or nothing but not AA
+ Matches the previous item 1 or more times A+ matches A, AA, AAA but not nothing
* Matches the previous item 0 or more times A* matches nothing, A or any number of A
characters

Positional Characters
Notation Description
^ The following pattern must be at the start of the string, or for a multi-line string, at the beginning
of a line. For multi-line text (string containing a carriage return), the multi-line flag option needs to
be set.
$ The preceding pattern must be at the end of the string, or for a multi-line string, at the end of a
line.
\A The preceding pattern must be at the start of the string; the multi-line flag is ignored.
\Z The preceding pattern must be at the end of the string; the multi-line pattern is ignored.
\b Matches a word boundary, essentially the point between a word character (a-z, A-Z, 0-9, _) and a
non-word character (the start of a word).
\B Matches a position that is not a word boundary (not the start of a word).

Grouping
Notation Characters Matched Example
()? Matches the pattern inside the brackets 0 or 1 times. (Error)? Matches Error or nothing
()+ Matches the pattern inside the brackets 1 or more times. (\w+\s)+ Matches AA AA
()* Matches the pattern inside the brackets 0 or more times. (\w+\s)* Matches nothing or AA AA

© LogRhythm, Inc. All rights reserved Page 16 of 19


The Non-Greedy Qualifier (?)
The non-greedy qualifier is a question mark (?) following a repetition character (*+?). The non-greedy qualifier is
used to tell the regex engine that it should stop matching the current match as soon as the next match criterion is
met. The non-greedy qualifier is used in combination with a repetition qualifier in order to create a non-greedy
match. The non-greedy qualifier improves performance when you want to match any text value up to a specific
text value where the specific text value can be uniquely specified within the regex.
For example, suppose your regex needs to match the following log:
02/28/2007 16:55:22 MsgID=1590 : Failed authentication for user john.doe user account locked out
If you use the following regex, incorrect values will be parsed for the login field due to the fact that user occurs
twice in the log message. Using this regex will cause “account” to be parsed into the login field.
MsgID=1590.*user (?<login>\w+\.?\w*)
This is because “.*” will match everything to the end of the log message. When the regex engine reaches the end
of the log message it will begin looking backwards in the log message for the next match. As soon as it finds the
last occurrence of “user” it will match for that portion of the log message. Since the specified regex for “login” will
match account, it will use that match and continue.
To make the regex take the first occurrence of the next match you use the non-greedy qualifier. The following
regex will parse the correct value into the login field because it will stop the previous match (.*) as soon as “user”
is encountered.
MsgID=1590.*?user (?<login>\w+\.?\w*)

Reserved Characters
The regex engine used by LogRhythm has 12 reserved characters that have special meaning. If any of these
characters need to be used as a literal character they will need to be escaped using the backslash (\) character,
otherwise known as the escape character. The reserved characters are:
• The opening square bracket [
• The opening round bracket (
• The closing round bracket )
• The backslash \
• The caret ^
• The dollar sign $
• The period .
• The vertical bar or pipe symbol |
• The question mark ?
• The asterisk or Kleene star *
• The plus sign +
• The opening curly bracket {
• The closing curly bracket }
The following regex, which is meant to match any IPv4 address (a.b.c.d), is a simple example of how to escape
reserved characters:
\d+\.\d+\.\d+\.\d+
As you can see each of the periods of the IP address are escaped meaning the regex engine will look for the actual
period (.) character in the string instead of looking for any character. Without the escape slash, the period refers to
any character, which would radically change the meaning of the expression.

© LogRhythm, Inc. All rights reserved Page 17 of 19


Other Special Characters
Other special characters that match special cases and cannot normally be typed into a regular expression:
Special Character Description
\n Matches newline
\r Matches carriage return
\t Matches tab
\nnn Matches ASCII character specified by octal number nnn
Ex: \103 matches C
\xnn Matches ASCII character specified by hexadecimal number nn
Ex: \x43 matches C
\unnnn Matches the Unicode character specified by the four hexadecimal digits replaced by nnnn
\cD Matches a control character
Ex: \cD matches end of transmission

© LogRhythm, Inc. All rights reserved Page 18 of 19


Regex Recommended Practices
The following are some recommended practices for regex development. All regex examples use the following log.
02/28/2007 16:55:22 MsgID=1590 : Failed authentication for user “any.user” user account locked out
Name Recommended Description
Pattern
Negative “[^”]*” for double Use negative character classes in log messages with clear delimiters,
Character Class quote delimiters, such as quotation marks, commas, or pipes. This will match any
,[^,]*, for comma character that is not the delimiter. This can greatly improve parsing
delimiters, or \|[^\|]*\| performance vs a more generic match, such as .*?.
for pipe delimiters.
Example: MsgID=1589.*?user\s”(?<account>[^”]*)”
Can be used for any
type of predictable
delimiter.
Non-greedy .*? If you need to match any characters until a specific set of characters
Match appears, use this pattern.
Example: MsgID=1590.*?user\s”(?<account>[^”]*)”
Overloading Map (?<[map tag]>[regex]) Map tags should almost always be overloaded. The default regex for
Tags map tags is .* which will match everything to the end of the log.
Example: MsgID=1590.*?user\s”(?<account>[^”]*)”\s(?<tag1>.*?)$
Preceding and N/A Always match as much constant text as possible. The more
trailing values information the regex has to evaluate, the faster it will be at
identifying non-matching logs. For any parsed field, it is best to
search for a constant value before and after the value being parsed.
Example: MsgID=1590.*?user\s”(?<account>[^”]*)”\s(?<tag1>.*?)$
Look Aheads (?=[regex])[regex] Positive and negative look ahead allows for an initial check in the
regex to see if a case is satisfied in the log messages. These are
(?![regex])[regex]
useful for finding a value later in a log to reduce extraneous
processing for non-matching logs. Do not use for values that appear
very early in a log message, such as just past a Syslog header. Look
ahead is more costly than regular expressions if the match is always
found early.
Example: (?=.*?match contains this phrase)\s<sip>\s
Multiline [\r\n] Using a character class containing both \r (return) and \n (newline)
character match allows for either multiline character to appear as well as both in
pattern either order. Some log messages vary in the order of these new lines
and others only contain a newline.
Narrow Character [a-z0-9_]+ The shorthand character class \w matches Latin alphabet characters,
Classes Hindu-Arabic numerals (0-9), underscores as well other scripts
supported in Unicode. Narrowing the match to only the relevant
character set will yield better performance.

© LogRhythm, Inc. All rights reserved Page 19 of 19

S-ar putea să vă placă și