Dynamic Lookups
Agenda

Lookups in General

Static Lookups

Dynamic Lookups
 -   Retrieve fields from a web site
 -   Retrieve fields from a database
 -   Retrieve fields from a persistent cache

                          2
Enrich Your Events with Fields from External Sources




                         3
Splunk: The Engine for Machine Data

   Customer                                                                                                                 Outside the
  Facing Data                                                                                                               Datacenter
Click-stream data                                                                                                        Manufacturing, logistics
Shopping cart data                                                                                                       …
Online transaction data                                                                                                  CDRs & IPDRs
                                                                                                                         Power consumption
                              Logfiles      Configs Messages   Traps        Metrics   Scripts    Changes    Tickets      RFID data
                                                               Alerts                                                    GPS data


                                                    Virtualization
   Windows                 Linux/Unix                                          Applications                Databases        Networking
                                                       & Cloud
 Registry                 Configurations            Hypervisor                Web logs                Configurations      Configurations
 Event logs               syslog                    Guest OS, Apps            Log4J, JMS, JMX         Audit/query logs    syslog
 File system              File system               Cloud                     .NET events             Tables              SNMP
 sysinternals             ps, iostat, top                                     Code and scripts        Schemas             netflow



                                                                        4
5
6
7
8
Interesting Things to Lookup


•   User’s Mailing Address          •   External Host Address
•   Error Code Descriptions         •   Database Query
•   Product Names                   •   Web Service Call for Status
•   Stock Symbol (from CUSIP)       •   Geo Location




                                9
Other Reasons For Lookup
• Bypass static developer or vendor that does not enrich logs
• Imaginative correlations
   • Example: A website URL with “Like” or “Dislike” count
     stored in external source
• Make your data more interesting
   • Better to see textual descriptions than arcane codes



                               10
Agenda

Lookups in General

Static Lookups

Dynamic Lookups
 -   Retrieve fields from a web site
 -   Retrieve fields from a database
 -   Retrieve fields from a persistent cache

                          11
Static vs. Dynamic Lookup


                         External Data comes from a CSV file
 Static



Dynamic
              External Data comes from output of external script, which
                                resembles a CSV file




                            12
Static Lookup Review
• Pick the input fields that will be used to get output fields
• Create or locate a CSV file that has all the fields you need in the
  proper order
• Tell Splunk via the Manager about your CSV file and your lookup
   • You can also define lookups manually via props.conf and
      transforms.conf
   • If you use automatic lookups, they will run every time the
      source, sourcetype or associated host stanza is used in a search
   • Non-automatic lookups run only when the lookup command is
      invoked in the search
                                   13
Example Static Lookup Conf Files
props.conf
         [access_combined]

         lookup_http = http_status status
                OUTPUT status_description, status_type
transforms.conf
         [http_status]


         filename = http_status.csv


                             14
Permissions
Define Lookups via Splunk Manager & set permissions there
                        local.meta

    [lookups/http_status.csv]
    export = system

    [transforms/http_status]
    export = system



                                15
Example Automatic Static Lookup




               16
Agenda

Lookups in General

Static Lookups

Dynamic Lookups
 -   Retrieve fields from a web site
 -   Retrieve fields from a database
 -   Retrieve fields from a persistent cache

                          17
Dynamic Lookups

• Write the script to simulate access to external source
• Test the script with one set of inputs
• Create the Splunk Version of the lookup script
• Register the script with Splunk via Manager or conf files
• Test the script explicitly before using automatic lookups



                              18
Lookups vs Custom Command
• Use dynamic lookups when returning fields given input fields
   • Standard use case for users who already are familiar with lookups
• Use a custom command when doing MORE than a lookup
   • Not all use cases involve just returning fields
       • Decrypt event data
       • Translate event data from one format to another with new fields
          (e.g. FIX)


                                     19
Write/Test External Field Gathering Script


                    Send: Input Fields
External Data in
Cloud                                      Your Python Script
                   Return: Output Fields




                          20
Example Script to Test External Lookup

# Given a host, find the corresponding IP address
def mylookup(host):
  try:
    ipaddrlist = socket.gethostbyname_ex(host)
    return ipaddrlist
  except:
  return[]

                        21
External Field Gathering Script with Splunk



External Data in
Cloud                    Your Python Script
                                        Return: Output Fields




                          22
Script for Splunk Simulates Reading Input CSV

          hostname, ip

          a.b.c.com

          zorrosty.com

          seemanny.com



                         23
Output of Script Returns Logically Complete CSV

           hostname, ip

           a.b.c.com, 1.2.3.4

           zorrosty.com, 192.168.1.10

           seemanny.com, 10.10.2.10



                          24
transforms.conf for Dynamic Lookup

[NameofLookup]
external_cmd = <name>.py field1….fieldN
external_type = python
fields_list = field1, …, fieldN




                        25
Example Dynamic Lookup conf files

             transforms.conf
   # Note – this is an explicit lookup

   [whoisLookup]
   external_cmd = whois_lookup.py ip whois
   external_type = python
   fields_list = ip, whois



                    26
Dynamic Lookup Python Flow
def lookup(input):
  Perform external lookup based on input. Return result

main()
Check standard input for CSV headers.

Write headers to standard output.

For each line in standard input (input fields):
 Gather input fields into a dictionary (key-value structure)
 ret = lookup(input fields)
 If ret:
 Send to standard output input values and return values
    from lookup

                                           27
Whois Lookup
def main():
  if len(sys.arv) != 3:
     print “Usage: python whois_lookup.py [ip field] [whois field]”
  sys.exit(0)
  ipf = sys.argv[1]
 whoisf = sys.argv[2]
 r = csv.reader(sys.stdin)
 w = none
 header = [ ]
 first = True…


                                        28
Whois Lookup (cont.) to Read CSV Header
# First get read the “CSV Header” and output the field names
for line in r:
  if first:
      header = line
      if whoisf not in header or ipf not in header:
         print “IP and whois fields must exist in CSV data”
         sys.exit(0)
      csv.write(sys.stdout).writerow(header)
      w = csv.DictWriter(sys.stdout, header)
      first = False
     continue…

                                    29
Whois Lookup (cont.) to Populate Input Fields
# Read the result and populate the values for the input fields (ip
address in our case)
    result = {}
    i=0
    while i < len(header):
      if i < len(line):
          result[header[i]] = line[i]
      else:
          result[header[i]] = ''
      i += 1

                                  30
Whois Lookup (cont.) to Populate Input Fields
# Perform the whois lookup if necessary
     if len(result[ipf]) and len(result[whoisf]):
         w.writerow(result)
# Else call external website to get whois field from the ip address as the
key
     elif len(result[ipf]):
         result[whoisf] = lookup(result[ipf])
         if len(result[whoisf]):
             w.writerow(result)


                                    31
Whois Lookup Function
LOCATION_URL=http://some.url.com?query=
# Given an ip, return the whois response
def lookup(ip):
  try:
      whois_ret = urllib.urlopen(LOCATION_URL + ip)
      lines = whois_ret.readlines()
      return lines
  except:
      return ''


                                    32
Database Lookup

• Acquire proper modules to connect to the database
• Connect and authenticate to database
   • Use a connection pool if possible
• Have lookup function query the database
   • Return a list([]) of results



                            33
Database Lookup vs. Database Sent To Index
• Well, it depends…
• Use a Lookup when:
   • Using needle in the haystack searches with a few users
   • Using form searches returning few results
• Index the database table or view when:
   • Having LOTS of users and ad hoc reporting is needed
   • It’s OK to have “stale” data (N minutes) old for a dynamic
     database

                                34
Example Database Lookup using MySQL

# First connect to DB outside of the for loop

conn = MySQLdb.connect(host = “localhost”,
                                 user = “name of user”,
                                 passwd = “password”,
                                 db = “Name of DB”)

cursor = conn.cursor()



                                 35
Example Database Lookup (cont.) using MySQL
import MySQLdb…

# Given a city, find its country

def lookup(city, cur):
 try:
    selString=“SELECT country FROM city_country where city=“
    cur.execute(selString + “”” + city + “””)
    row = cur.fetechone()
    return row[0]
 except:
    return []


                                       36
Lookup Using Key Value Persistent Cache

• Download and install Redis
• Download and install Redis Python module
                                                  Redis is an open
• Import Redis module in Python and populate      source, advanced key-
                                                  value store.
  key value DB
• Import Redis module in lookup function
  given to Splunk to lookup a value given a key


                                37
Redis Lookup
###CHANGE PATH According to your REDIS install ######
sys.path.append(“/Library/Python/2.6/…/redis-2.4.5-py.egg”)
import redis
…
def main()
…
#Connect to redis – Change for your distribution
pool = redis.ConnectionPool(host=„localhost‟,port=6379,db=0)
redp = redis.Redis(connection_pool=pool)




                                         38
Redis Lookup (cont.)

def lookup(redp, mykey):

try:
  return redp.get(mykey)

except:
  return “”




                 39
Combine Persistent Cache with External Lookup
• For data that is “relatively static”
   • First see if the data is in the persistent cache
   • If not, look it up in the external source such as a database or
     web service
   • If results come back, add results to the persistent cache and
     return results
• For data that changes often, you will need to create your own cache
  retention policies

                                 40
Combining Redis with Whois Lookup
def lookup(redp, ip):
  try:
      ret = redp.get(ip)
      if ret!=None and ret!='':
          return ret
      else:
          whois_ret = urllib.urlopen(LOCATION_URL + ip)
          lines = whois_ret.readlines()
          if lines!='':
               redp.set(ip, lines)
          return lines…
  except:


                                    41
Where do I get the add-ons from today?
                            Splunkbase!
     Add-On                       Download Location                    Release

                   http://splunk-base.splunk.com/apps/22381/whois-   4.x
     Whois         add-on

                   http://splunk-                                    4.x
    DBLookup       base.splunk.com/apps/22394/example-lookup-
                   using-a-database
                   http://splunk-base.splunk.com/apps/27106/redis-   4.x
  Redis Lookup     lookup

                   http://splunk-base.splunk.com/apps/22282/geo-     4.x
Geo IP Lookup (not
                   location-lookup-script-powered-by-maxmind
 in these slides)
                                        42
Conclusion


Lookups are a powerful way to enhance
your search experience beyond indexing
               the data.


                   43
Thank You

Splunk Dynamic lookup

  • 1.
  • 2.
    Agenda Lookups in General StaticLookups Dynamic Lookups - Retrieve fields from a web site - Retrieve fields from a database - Retrieve fields from a persistent cache 2
  • 3.
    Enrich Your Eventswith Fields from External Sources 3
  • 4.
    Splunk: The Enginefor Machine Data Customer Outside the Facing Data Datacenter Click-stream data Manufacturing, logistics Shopping cart data … Online transaction data CDRs & IPDRs Power consumption Logfiles Configs Messages Traps Metrics Scripts Changes Tickets RFID data Alerts GPS data Virtualization Windows Linux/Unix Applications Databases Networking & Cloud Registry Configurations Hypervisor Web logs Configurations Configurations Event logs syslog Guest OS, Apps Log4J, JMS, JMX Audit/query logs syslog File system File system Cloud .NET events Tables SNMP sysinternals ps, iostat, top Code and scripts Schemas netflow 4
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
    Interesting Things toLookup • User’s Mailing Address • External Host Address • Error Code Descriptions • Database Query • Product Names • Web Service Call for Status • Stock Symbol (from CUSIP) • Geo Location 9
  • 10.
    Other Reasons ForLookup • Bypass static developer or vendor that does not enrich logs • Imaginative correlations • Example: A website URL with “Like” or “Dislike” count stored in external source • Make your data more interesting • Better to see textual descriptions than arcane codes 10
  • 11.
    Agenda Lookups in General StaticLookups Dynamic Lookups - Retrieve fields from a web site - Retrieve fields from a database - Retrieve fields from a persistent cache 11
  • 12.
    Static vs. DynamicLookup External Data comes from a CSV file Static Dynamic External Data comes from output of external script, which resembles a CSV file 12
  • 13.
    Static Lookup Review •Pick the input fields that will be used to get output fields • Create or locate a CSV file that has all the fields you need in the proper order • Tell Splunk via the Manager about your CSV file and your lookup • You can also define lookups manually via props.conf and transforms.conf • If you use automatic lookups, they will run every time the source, sourcetype or associated host stanza is used in a search • Non-automatic lookups run only when the lookup command is invoked in the search 13
  • 14.
    Example Static LookupConf Files props.conf [access_combined] lookup_http = http_status status OUTPUT status_description, status_type transforms.conf [http_status] filename = http_status.csv 14
  • 15.
    Permissions Define Lookups viaSplunk Manager & set permissions there local.meta [lookups/http_status.csv] export = system [transforms/http_status] export = system 15
  • 16.
  • 17.
    Agenda Lookups in General StaticLookups Dynamic Lookups - Retrieve fields from a web site - Retrieve fields from a database - Retrieve fields from a persistent cache 17
  • 18.
    Dynamic Lookups • Writethe script to simulate access to external source • Test the script with one set of inputs • Create the Splunk Version of the lookup script • Register the script with Splunk via Manager or conf files • Test the script explicitly before using automatic lookups 18
  • 19.
    Lookups vs CustomCommand • Use dynamic lookups when returning fields given input fields • Standard use case for users who already are familiar with lookups • Use a custom command when doing MORE than a lookup • Not all use cases involve just returning fields • Decrypt event data • Translate event data from one format to another with new fields (e.g. FIX) 19
  • 20.
    Write/Test External FieldGathering Script Send: Input Fields External Data in Cloud Your Python Script Return: Output Fields 20
  • 21.
    Example Script toTest External Lookup # Given a host, find the corresponding IP address def mylookup(host): try: ipaddrlist = socket.gethostbyname_ex(host) return ipaddrlist except: return[] 21
  • 22.
    External Field GatheringScript with Splunk External Data in Cloud Your Python Script Return: Output Fields 22
  • 23.
    Script for SplunkSimulates Reading Input CSV hostname, ip a.b.c.com zorrosty.com seemanny.com 23
  • 24.
    Output of ScriptReturns Logically Complete CSV hostname, ip a.b.c.com, 1.2.3.4 zorrosty.com, 192.168.1.10 seemanny.com, 10.10.2.10 24
  • 25.
    transforms.conf for DynamicLookup [NameofLookup] external_cmd = <name>.py field1….fieldN external_type = python fields_list = field1, …, fieldN 25
  • 26.
    Example Dynamic Lookupconf files transforms.conf # Note – this is an explicit lookup [whoisLookup] external_cmd = whois_lookup.py ip whois external_type = python fields_list = ip, whois 26
  • 27.
    Dynamic Lookup PythonFlow def lookup(input): Perform external lookup based on input. Return result main() Check standard input for CSV headers. Write headers to standard output. For each line in standard input (input fields): Gather input fields into a dictionary (key-value structure) ret = lookup(input fields) If ret: Send to standard output input values and return values from lookup 27
  • 28.
    Whois Lookup def main(): if len(sys.arv) != 3: print “Usage: python whois_lookup.py [ip field] [whois field]” sys.exit(0) ipf = sys.argv[1] whoisf = sys.argv[2] r = csv.reader(sys.stdin) w = none header = [ ] first = True… 28
  • 29.
    Whois Lookup (cont.)to Read CSV Header # First get read the “CSV Header” and output the field names for line in r: if first: header = line if whoisf not in header or ipf not in header: print “IP and whois fields must exist in CSV data” sys.exit(0) csv.write(sys.stdout).writerow(header) w = csv.DictWriter(sys.stdout, header) first = False continue… 29
  • 30.
    Whois Lookup (cont.)to Populate Input Fields # Read the result and populate the values for the input fields (ip address in our case) result = {} i=0 while i < len(header): if i < len(line): result[header[i]] = line[i] else: result[header[i]] = '' i += 1 30
  • 31.
    Whois Lookup (cont.)to Populate Input Fields # Perform the whois lookup if necessary if len(result[ipf]) and len(result[whoisf]): w.writerow(result) # Else call external website to get whois field from the ip address as the key elif len(result[ipf]): result[whoisf] = lookup(result[ipf]) if len(result[whoisf]): w.writerow(result) 31
  • 32.
    Whois Lookup Function LOCATION_URL=http://some.url.com?query= #Given an ip, return the whois response def lookup(ip): try: whois_ret = urllib.urlopen(LOCATION_URL + ip) lines = whois_ret.readlines() return lines except: return '' 32
  • 33.
    Database Lookup • Acquireproper modules to connect to the database • Connect and authenticate to database • Use a connection pool if possible • Have lookup function query the database • Return a list([]) of results 33
  • 34.
    Database Lookup vs.Database Sent To Index • Well, it depends… • Use a Lookup when: • Using needle in the haystack searches with a few users • Using form searches returning few results • Index the database table or view when: • Having LOTS of users and ad hoc reporting is needed • It’s OK to have “stale” data (N minutes) old for a dynamic database 34
  • 35.
    Example Database Lookupusing MySQL # First connect to DB outside of the for loop conn = MySQLdb.connect(host = “localhost”, user = “name of user”, passwd = “password”, db = “Name of DB”) cursor = conn.cursor() 35
  • 36.
    Example Database Lookup(cont.) using MySQL import MySQLdb… # Given a city, find its country def lookup(city, cur): try: selString=“SELECT country FROM city_country where city=“ cur.execute(selString + “”” + city + “””) row = cur.fetechone() return row[0] except: return [] 36
  • 37.
    Lookup Using KeyValue Persistent Cache • Download and install Redis • Download and install Redis Python module Redis is an open • Import Redis module in Python and populate source, advanced key- value store. key value DB • Import Redis module in lookup function given to Splunk to lookup a value given a key 37
  • 38.
    Redis Lookup ###CHANGE PATHAccording to your REDIS install ###### sys.path.append(“/Library/Python/2.6/…/redis-2.4.5-py.egg”) import redis … def main() … #Connect to redis – Change for your distribution pool = redis.ConnectionPool(host=„localhost‟,port=6379,db=0) redp = redis.Redis(connection_pool=pool) 38
  • 39.
    Redis Lookup (cont.) deflookup(redp, mykey): try: return redp.get(mykey) except: return “” 39
  • 40.
    Combine Persistent Cachewith External Lookup • For data that is “relatively static” • First see if the data is in the persistent cache • If not, look it up in the external source such as a database or web service • If results come back, add results to the persistent cache and return results • For data that changes often, you will need to create your own cache retention policies 40
  • 41.
    Combining Redis withWhois Lookup def lookup(redp, ip): try: ret = redp.get(ip) if ret!=None and ret!='': return ret else: whois_ret = urllib.urlopen(LOCATION_URL + ip) lines = whois_ret.readlines() if lines!='': redp.set(ip, lines) return lines… except: 41
  • 42.
    Where do Iget the add-ons from today? Splunkbase! Add-On Download Location Release http://splunk-base.splunk.com/apps/22381/whois- 4.x Whois add-on http://splunk- 4.x DBLookup base.splunk.com/apps/22394/example-lookup- using-a-database http://splunk-base.splunk.com/apps/27106/redis- 4.x Redis Lookup lookup http://splunk-base.splunk.com/apps/22282/geo- 4.x Geo IP Lookup (not location-lookup-script-powered-by-maxmind in these slides) 42
  • 43.
    Conclusion Lookups are apowerful way to enhance your search experience beyond indexing the data. 43
  • 44.

Editor's Notes

  • #5 Splunk is a data engine for your machine data. It gives you real-time visibility and intelligence into what’s happening across your IT infrastructure – whether it’s physical, virtual or in the cloud. Everybody now recognizes the value of this data, the problem up to now has been getting to it. At Splunk we applied the search engine paradigm to being able to rapidly harness any and all machine data wherever it originates. The “no predefined schema” design, means you can point Splunk at any of your data, regardless of format, source or location. There is no need to build custom parsers or connectors, there’s no traditional RDBMS, there’s no need to filter and forward.Here we see just a sample of the kinds of data Splunk can ‘eat’.Reminder – what’s the ‘big deal’ about machine data? It holds a categorical record of the following:User transactionsCustomer behaviorMachine behaviorSecurity threatsFraudulent activityYou can imagine that a single user transaction can span many systems and sources of this data, or a single service relies on many underlying systems. Splunk gives you one place to search, report on, analyze and visualize all this data.