how to extract data from a payload using regex ?

Posted by

Normally the events probably comes in some different formats and it could differ from each source device, could be syslog, CEF format, ID-based, etc. the source devices creates payloads with all the necessary information associated to every specific event. That helps to analyze that information later with some specific tools.

Payload Example:

<182>Oct 3 12:12:23 10.34.34.12 Jun 30 17:54:00 34.139.234.113 : %ASA-6-802016: Teardown UDP connection 1546540759 for outside:134.20.48.76/53 to inside:71.10.38.6/25826 duration 0:00:00 bytes 112

The basic or most common regex syntax to play with are the following,

\ = works to match any symbol
\s= means “space”
\w= means “words”
\d= means “numbers”
. = means “any character”

Also there some conditions to use with, some examples:

*= means “0 or multiples”
+= means one or more
()= means “capture”
? = required when you are capturing.

If you want to use the OR operation you will need to use “|” the pipe symbol.

Exist a lot of ways to extract that information using Regex, also exist more commands/syntax that you can review on the web, but to understand more how to extract data, let’s check the following examples:

Data to extract

1. Extract the event name, “Teardown UDP connection” from the payload.
2. Extract second date “Jun 30”
3. Bytes number “112”

With regex you can use specific words that comes with the payload to set the beginning of the expression you want to extract. For example, the following are the Regex for every of the previous examples,

1. 2016\:\s(\w+\s\w+\s\w*?)
2. .12\s(\w+\s\d*?)
3. bytes\s(\d*?)

For example:

To get the expression of “Teardown UDP connection” you can do the following,

Payload

<182>Oct 3 12:12:23 10.34.34.12 Jun 30 17:54:00 34.139.234.113 : %ASA-6-802016: Teardown UDP connection 1546540759 for outside:134.20.48.76/53 to inside:71.10.38.6/25826 duration 0:00:00 bytes 112

The information we need to extract is right after the “%ASA-6-802016:” (without quotes), so, the beginning of the expression could be at this time the final numbers, “2016” (without quotes)

2016\:\s(\w+\s\w+\s\w*?)

After that you need to create the regex expression to capture “Teardown UDP connection”, to do that you need to analyze the data you will extract and you will need to convert every space, words, numbers, symbols etc. with the regex syntax into the regex expression to retrieve that data.

After 2016 you have the symbol “:” the way to match this is “\:” (without quotes), the next data is the event name and that is the data we want to extract but it contains spaces so you need to create that spaces on the regex expression and use the parenthesis to capture all data you want to extract.

Let’s check the composition of this regex expression:

2016 \: \s (\w+ \s \w+ \s \w * ?)

regex-example

Result: Teardown UDP connection

Check the other two examples, both start with words or numbers, compare the regex expressions and verify if those are correct or not.

Enjoy.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s