Sunteți pe pagina 1din 154

Splunk Storm Storm

Splunk Storm User Manual


Generated: 7/10/2013 11:45 pm

Copyright 2013 Splunk, Inc. All Rights Reserved

Table of Contents
Introduction..........................................................................................................1 About Splunk Storm..................................................................................1 Storm FAQ................................................................................................2 Learn more and get help...........................................................................4 Known issues and changelog....................................................................5 Splunk concepts................................................................................................18 Inputs and projects..................................................................................18 About source types.................................................................................18 Add value to your application: conscious logging .........................................22 Why care about your logs?.....................................................................22 About logging..........................................................................................26 Best practices ..........................................................................................29 Examples................................................................................................31 Storm Tutorial....................................................................................................34 Welcome to the Storm tutorial.................................................................34 Create a Storm account and log in ..........................................................34 Create the tutorial project........................................................................34 Add data to the tutorial project................................................................35 Introduction to the Storm UI....................................................................37 Start searching........................................................................................41 Use the timeline......................................................................................46 Change the time range ............................................................................50 Use fields to search .................................................................................53 Save a search.........................................................................................61 Use Splunk's search language ................................................................63 Use a subsearch.....................................................................................70 More search examples............................................................................73 Create reports.........................................................................................77 Build and share a dashboard..................................................................83 Get started..........................................................................................................86 Create and activate an account..............................................................86 Create a project .......................................................................................86 Choose a data storage plan....................................................................88 How much data am I sending to Storm?.................................................90 About billing .............................................................................................91
i

Table of Contents
Add data ..............................................................................................................93 About adding data...................................................................................93 Send data (including syslog) over a TCP/UDP port.................................93 Set up syslog ...........................................................................................96 Set up syslog-ng.....................................................................................97 Set up rsyslog.........................................................................................99 Set up Snare (for Windows)..................................................................100 Send data via netcat.............................................................................102 Send data from Heroku.........................................................................102 Upload a file..........................................................................................104 Send data with forwarders..............................................................................106 About forwarding data to Storm............................................................106 Set up a universal forwarder on Windows .............................................108 Set up a universal forwarder on *nix.....................................................111 Edit inputs.conf ......................................................................................115 CLI commands for input........................................................................119 Send data with Storm's REST API ..................................................................122 Use Storm's REST API.........................................................................122 Storm data input endpoint.....................................................................126 Examples: input data with Python or Ruby...........................................129 Explore your data .............................................................................................132 About Splunk Web................................................................................132 Search your data...................................................................................132 Search language quick reference.........................................................133 Print a PDF ............................................................................................139 Manage and share a project ............................................................................142 About projects.......................................................................................142 Share your project.................................................................................142 Transfer ownership of a project .............................................................144 Troubleshoot a project..........................................................................145 Delete data from a project.....................................................................146 About inactive projects..........................................................................147

ii

Table of Contents
Alerts - Coming soon!.....................................................................................149 Alerting overview - coming soon...........................................................149

iii

Introduction
About Splunk Storm
What is Splunk Storm?
Splunk Storm is a cloud-based service that turns machine data into valuable insights. Machine data is generated by web sites, applications, servers, networks, mobile devices, and the like. Splunk Storm consumes machine data and allows users to search and visualize it to monitor and analyze everything from customer clickstreams and transactions to network activity to call records.

Is Splunk Storm different from Splunk?


Yes! Splunk Storm is a cloud service-based version of Splunk. If you're familiar with Splunk you'll see as soon as you log in that Splunk Storm uses the same paradigms and search tools. Here are a few things to keep in mind. If you need Splunk to run on your private network or if you need advanced features not yet available in Splunk Storm, then use Splunk! For an overview of Splunk Storm vs. Splunk Enterprise features, see splunk.com.

Who is Splunk Storm for?


Splunk Storm is for users who want a turnkey, fully managed and hosted service for their machine data. The service offers both free and paid versions, allowing you to elastically scale your Storm projects as your data needs grow. Splunk Storm has many uses and many different types of users. Developers, application support staff, system administrators, and security analysts -- even Managers, VPs, and CIOs -- use Splunk Storm to do their jobs better and faster. Developers can build statistical analysis into applications, find and isolate bugs or performance problems in code, record and analyze events using semantic logging. Application support staff can investigate and remediate across an application environment and proactively monitor performance, availability, and business metrics across an entire service. System administrators and IT staff can investigate server problems, understand their configurations, and monitor user activity, performance
1

thresholds, critical system errors, and load. Security analysts and incident response teams can investigate activity of flagged users and access to sensitive data, and use sophisticated correlation via search to find known risk patterns such as brute force attacks, data leakage, and even application-level fraud.

How do I sign up for Splunk Storm?


To sign up for Splunk Storm, visit http://splunkstorm.com and follow the instructions in "Create and activate an account".

Storm FAQ
The following is a list of frequently asked questions about Splunk Storm:

What browsers are supported for use with Splunk Storm?


Firefox 4 or higher Internet Explorer 8 or higher Safari 5 Google Chrome Supported browsers are subject to change.

Do you support sending data from Splunk forwarders?


Yes! Read "About forwarding data to Storm" for more information.

Do you support sending data via an API?


The Storm API endpoint for data input is in Beta! Read "Use Storm's REST API" to learn how to use this feature.

How much data I can send?


You can store as much as 1 TB of data in one project, depending on what you have chosen as your project's storage volume. Once your storage is filled up, Storm will stop indexing new data that comes in. Read "Delete data from a project" to learn how to free up space and resume indexing.

Is my indexed data private and secure?


Any indexed data stored in Storm will be viewable only by you and any members you invite to your project. Splunk Storm runs in the Amazon cloud, and makes use of both Elastic Block Storage (EBS) and the Simple Storage Service (S3). All instances of EBS are configured as RAID10 and have a hot standby spare. When you add data to a project in Storm, it makes a copy of your original raw data and saves it to an EBS instance for subsequent backup to an S3 instance in the Amazon cloud. At the same time, it sends your data to be indexed into Splunk Storm and saves the indexed data in another EBS instance. Additionally, there are multiple methods for sending data to Storm. Some of these methods involve sending unencrypted data over the public internet to Amazon EC2. As with normal web traffic, this is inherently insecure and is provided for convenience and for data that is not sensitive. We recommend that you use a secure method for sending potentially sensitive data.

What data should I send to the service to try it out?


Anything! Here are some examples we?ve been using during testing: Send your personal web app logs over a network port (from anywhere). Send a stream of syslog data from your infrastructure. Configure your home router with DD-WRT and point its syslog to Storm. Find out who?s going where on your network! Upload or netcat a file you have on your laptop or server. Send Storm dtrace output. Configure your Windows PC with Snare and syslog out and send Storm Windows events.

Can I blog/tweet about the service?


Sure! We'd appreciate it if you use the hashtag #splunkstorm so we can find your comments easily :).

Learn more and get help


This topic tells you how to get help with issues you notice in Storm and how to provide your feedback.

Get help, give feedback, and file bugs for Splunk Storm
There is a dedicated support and feedback forum. To access the forum, log into your www.splunkstorm.com account. Click Help at the top right corner of the user interface, and then click Discussion forums. Your project ID, which you can find on your project's Settings page, is included automatically with your posting.

Contact Storm Support


If you encounter a problem with Storm, you can log a Support case. Note: Storm members with paid projects receive priority service. 1. From any screen in Storm when you are logged in, click Help at the top right. 2. Click Report an issue at the bottom of the page. 3. Describe the problem briefly in the subject field and clearly in the description field. 4. Be sure to click the name of the project that's giving you trouble. If the issue you're reporting applies to more than one project, you can include additional project IDs in your description. Find a project's ID in its Settings page. 5. Select the appropriate Priority for your issue. 6. Select the check box if you grant us permission to look at your search history and data for troubleshooting purposes. 7. Click Report to submit your case.

Find more information about Splunk


You've got a variety of options for finding more information about Splunk:

The core Splunk documentation Splunk Answers The #splunk IRC channel on EFNET

Known issues and changelog


This topic lists known issues and updates to the Splunk Storm service.

Known issues
This section lists known issues in Splunk Storm. As bugs get fixed, they are moved from known issues to the changelog. Data inputs On a TCP or UDP input only (not on forwarder or API inputs), if you decide to change the source type of an input (on project's Network data > Edit by IP address), the source type will not be updated automatically. With TCP, you need to wait about 1 minute (after selecting new source type and clicking Update) and then reconnect. With UDP, you need to wait about 3 minutes and reconnect. (STORM-1626) Events may be indexed with incorrect source and host and with source type "tcp-raw". (STORM-4115) Timezone is not recognized with two source types: generic single-line data and Storm multi-line data. (STORM-6312) With TCP/UDP inputs, the Inputs > Network data > Data last received column shows NA. (STORM-6343/STORM-6375) Exceptionally, some events may be missing after a file upload. (STORM-6555) Column headers for CSV files are not automatically detected. To work around this, define them manually in Manager > Fields > Field extractions using Regex. (STORM-5195)

Exploring data The count of events on the summary page (after you click "Explore Data") may not update immediately after an upload. Solution: Refresh the page. (STORM-4394/SPL-51502) A new project might not be searchable, and display banners showing an error like "Reached end-of-stream" or "Search process did not exit cleanly, exit_code=254". Please open a support ticket in Help > Report an issue. (STORM-6344) Error message on the summary page "Streamed search execute failed because: Error in 'metadata': The metadata search command cannot be run because global metadata generation has been disabled on index XXX". This known issue does not impact search, only the counters on the summary page. (STORM-5887) Multiple projects on the same search head (that is, search1, search2, or search3 in the URL while exploring data) cannot coexist in the same browser. Workaround: Explore one project at a time, use different browsers, or use anonymous browsing windows. (STORM-5712) The "iplocation" search command fails with the errors "exited with code 255". (STORM-6715) Other Total data received over the past week and month does not show in Project > Storage. (STORM-5065)

Changelog
This section lists the issues resolved in each production update of the Splunk Storm service, by date. July 10, 2013 Implement new UI tools. (STORM-6850, STORM-6851, STORM-6852) Orchestration bugfixes. Tenderapp help forum has been deprecated in favor of Splunk Answers (answers.splunk.com). Answers is more fun. (STORM-6576) Backend changes to support yearly billing through sales in addition to self-service monthly billing (which is also still available). (STORM-6740,
6

STORM-6784) Count of yearly invoices, if any exist, is viewable in user Account page. (STORM-6934) June 5, 2013 When a customer downgrades a project to a cheaper plan, the change now goes into effect immediately (instead of with the new billing cycle). (STORM-6567) Upgraded Splunk package to improve Splunk stability. Fixed a bug with the Windows eventlog source type. (STORM-6600) Cosmetic issue: the spinning wheel during searches appears to never finalize. (STORM-6529) May 8, 2013 Spring cleaning: After today's maintenance, you might get email about inactive free projects. Read more at about inactive projects. Added sample code for API inputs to UI. (STORM-6447) UI bugfixes: removed errant "unit tests" button (STORM-6550) Assorted UI bugfixes. (STORM-6493, STORM-6492, STORM-6482) Upgraded Splunk package (same version, just a few Splunk-side bugfixes). E.g., got rid of annoying app deletion messages in yellow banners. (SPL-65784) Fixed bug with timezone recognition in source types. (STORM-6312) April 24, 2013 Alerting is now in Private Beta. (STORM-6070, STORM-6346, STORM-5396) Fixed a bug in which some invoices were not sorting well. (STORM-6401) Fixed a bug that was preventing some data from uploaded files (not any other inputs) from being received. (STORM-6430) File upload now checks for source type earlier in the workflow. (STORM-5789) April 18, 2013 Upgraded Storm to Splunk 5.0.3. Restart your browser and/or do a shift + reload in your browser for the UI changes to take effect. Features of interest in Splunk 5.x include integrated PDF generation, which allows you to create PDF files from your simple XML dashboards, views, searches, or reports; dynamic drilldown, which lets you create
7

custom drilldown behavior for any simple XML table or chart; and JSCharts enhancements. April 11, 2013 API input endpoint now recognizes the timezone parameter tz. (STORM-6300) Storm now recognizes the log4net_xml layout source type. (STORM-6051) Storm now recognizes the search commands gauge, gentimes, iplocation, xmlv, and xpath. (STORM-4699) Alerting infrastructure work. (STORM-6071) Alerting workflow planning. (STORM-6070, STORM-6331, STORM-6342) IFX field extraction test was failing to open a new window. (STORM-3827) March 19, 2013 Alerting backend work. (STORM-6071, STORM-6191) UI styling for signup pages. (STORM-6202) Improvements to project deletion. (STORM-6143) Resolved transfer ownership bug with third-party billing vendor. (STORM-5994) February 28, 2013 Bugfixing. (STORM-6055, STORM-6021) Added monitoring for situation that produces the end-of-stream error in the UI for some customers. (STORM-6134) February 14, 2013 More backend work for alerting (no, not yet). (STORM-5861) Wording changes on new inputs overview page. (STORM-5952) Help page update. (STORM-4628) Backend bugfixes and infrastructure work. January 28, 2013 Backend work to support alerting (Beta coming soon!). (STORM-5921, STORM-5916, STORM-5652, STORM-5648, STORM-5637, STORM-5615, STORM-5623, STORM-5606) Release infrastructure work. (STORM-5955)

January 17, 2013 Release process improvements. (STORM-5562, STORM-5795, STORM-4824) Infrastructure improvements. (STORM-5936, STORM-5891, STORM-5849) Backend work for transferring project ownership. (STORM-5880, STORM-5411, STORM-5258) Note: For now, project ownership can be transferred only by filing a Support ticket. Beta API input endpoint now supports gzip encoding. (STORM-5848) Design changes to "change plan" and "create project" pages. (STORM-5658, STORM-5657) Added an overview page to inputs workflow. (STORM-5289) UI tweaks. (STORM-5580, STORM-4921, STORM-5858) December 12, 2012 Fixed UI bugs. (STORM-5758, STORM-3055) Added a more descriptive error page. (STORM-3916) Reworked API input endpoint. (STORM-5074, STORM-5075, STORM-5076, STORM-5077, STORM-5491) SYNTAX CHANGE: If you were previously using the Storm API input endpoint, note that the auth token has moved into the password field. The documentation has been updated to reflect the new syntax. (STORM-5472) November 27, 2012 Owners of paid projects can now invite 10 other members. Free projects still get 5 invites. (STORM-5538) Fixed a problem in which some projects were missing buckets and thus not able to store as much data as they should. (STORM-5595) Improved sparklines on Projects page. (STORM-5300) Improved login and saving project settings screen flows. (STORM-5494, STORM-5354, STORM-5288) Assorted UI tweaks on marketing and signup. (STORM-5503, STORM-5482, STORM-5376) October 24, 2012 Fixed issue with failure to change deletion policy on project. (STORM-5223) Added informational message to the "Create Ticket" page. (STORM-5150)
9

Users can triple-click to select the port number on the network input page. (STORM-5353) Implemented various orchestration and infrastructural improvements. September 25, 2012 Fixed issue with intermittent HTTP 500 Internal Server Error when uploading a file. (STORM-5037) Fixed issue causing traceback with "DatabaseError: deadlock detected" when creating a new Free project. (STORM-5006) Fixed issue preventing change of deletion policy. (STORM-4867) Restructured packaging of Storm so search heads don't restart during a release unless changes apply specfically to them. (STORM-4809) September 6, 2012 Fixed issue with changing deletion policy not having access to timezone information. (STORM-4947) Made improvements to internal monitoring. (STORM-4622) Made improvements to signup page form handling. (STORM-5010, STORM-5114, STORM-5058, STORM-5085) Made improvements to info on data deletion policy page. (STORM-5024) August 23, 2012 Input source type menu issue in IE8 resolved. (STORM-4983) Fixed issue with 503 errors when viewing dashboards. (STORM-4866) Issue with renaming projects is resolved. (STORM-5016) Migrate some production instances to new hardware. (STORM-4963) Improve internal logging for user plan change actions. (STORM-5034) Minor fix to login dialog box. (STORM-4978) August 15, 2012 Various display issues in IE9 resolved. (STORM-4980, STORM-4979 Fix data storage plan slider. (STORM-4976) New Terms of Service agreement displayed. (STORM-4704, STORM-4851) Number of free projects changed to 1. (STORM-4696) REST API access removed pending further development. (STORM-4936) Remove "Beta" from the help link urls. (STORM-4737) Change minimum retention policy to 30 days. (STORM-4967)

10

Deleted project member correctly removed from list of members. (STORM-4924) August 3, 2012 Links to relevant documentation added to API input page. (STORM-4795, STORM-4852) Issue with project creation UI breaking in Chrome has been resolved. (STORM-4701) "DatabaseError: could not obtain lock on row" error when clicking Explore has been resolved. (STORM-4802, STORM-4500) July 12, 2012 New supported source types: mail_nodate, syslog_nohost, mysql_slow. (STORM-4393, STORM-4395, STORM-4404, STORM-4416) Visual UI fixes. (STORM-4402, STORM-4322, STORM-2714, STORM-3676, STORM-4185) A bug producing 404 errors with long periods of user inactivity has been fixed. (STORM-4398) June 27, 2012 Bug with changing project plans is fixed. (STORM-4164) Visual UI bugs fixed. (STORM-3203, STORM-4092, STORM-4312, STORM-2714, STORM-3101) Explore data button bugs fixed. (STORM-4002, STORM-4100, STORM-4144) 6/21/2012 Storm now correctly indexes Windows event logs sent with a Storm universal forwarder. (STORM-4139) 6/19/2012 Fixed some bugs with "Explore data" button. (STORM-4300, STORM-4147, STORM-4002) If an account develops a payment issue, a warning message will appear on all the input pages. (STORM-4210) Improved no-JS error messaging and site options. (STORM-4043) Visual UI bugfixes. (STORM-4134, STORM-4154, STORM-3513, STORM-4163, 4075, 4154)
11

6/11/2012 Save search, then share, now generates a correct link. (SPL-52088) 6/5/2012 Visual UI updates. (STORM-4208, STORM-3935, STORM-4061, STORM-3279) With the Storm predefined syslog source type, the number of characters Storm looks into the event for a time stamp has been increased to 100 characters. (STORM-3552) Help links from UI into docs improved. (STORM-4119) Improved DNS error wording in UI. (STORM-3871) Fixed python error with sending data through the API input endpoint. (STORM-4172) Changes to the project timezone now propagate more quickly to the default timezone for any events with no explicit timezone themselves. (STORM-4014) 5/23/2012 Fixed bug with project creation. (STORM-4109) Many updates to public web site. (STORM-3724, STORM-2723, STORM-3698, STORM-3697) Fixed bug with help links to documentation. (STORM-3878) 5/17/2012 Visual improvements to UI. (STORM-3868, STORM-4013) Fixed bug about incorrect Twitter source type. (STORM-4016) Fixed bug with editing profile email address. (STORM-3397) Improvements to data storage graphs. (STORM-3167, STORM-3791) 5/10/2012 Performance improvements to data inputs. (STORM-3702, STORM-3564) Bug showing incorrect time zone for scheduled plan downgrades has been fixed. (STORM-3255) 5/9/2012 Introduced new predefined source types: Facebook, Foursquare, Google+, Twitter, and three for JSON data. (STORM-3454)
12

Trying to file a Support ticket when you're not logged in now takes you to a login screen (then to Support page), instead of 404. (STORM-3845) A bug causing some scheduled search jobs to run twice has been fixed. (STORM-3826) Improved display and accuracy of storage management pages. (STORM-3795, STORM-3791, STORM-3171, STORM-3745) Updated wording around search result sharing in Splunk Web. (STORM-3538) Fixed display issues in Splunk Web Jobs menu. (STORM-3469) Fixed bugs in billing integration. (STORM-3797, STORM-3802) 4/23/2012 A bug causing data from a file upload to intermittently not appear has been fixed. (STORM-3751) Some intermittent bugs in project upgrading have been fixed. (STORM-3752, STORM-3565, STORM-3284) UI now shows a progress message when a user is deleting data. (STORM-3645) Several visual UI bugs fixed. (STORM-3680, STORM-2609, STORM-3754, STORM-3582, STORM-3400, 3578) Two bugs with the "save and share a search" popup window have been fixed. (STORM-3679, STORM-3678) A bug that removed the auto authorization link when auto authorization time ran out has been fixed. (STORM-3671) A bug preventing edits to _tzhint in a universal forwarder's inputs.conf file from taking effect has been fixed. (STORM-3674) 4/12/12 Storm now supports forwarding, with the Splunk universal forwarder. (STORM-1677, STORM-2069, STORM-2784, STORM-3404, STORM-3584) A bug preventing a user from regenerating an API user token has been fixed. (STORM-3037) Data storage graph has been updated. (STORM-3190, STORM-3401) A bug preventing a non-admin user from leaving a project has been fixed. (STORM-3492) Wording changed in adding network data workflow. (STORM-3496, STORM-3497) Login page visual alignment fixed. (STORM-3562) A bug showing the "store data indefinitely" button on the deletion policy page incorrectly has been fixed. (STORM-3541)
13

File upload is no longer delayed when multiple users are uploading files simultaneously. (STORM-3396) Timing of project downgrades has been fixed. (STORM-3405, STORM-3463) 3/29/12 A bug causing uploaded files to intermittently take a very long time to appear in Storm has been fixed. (STORM-3477) A usability bug about a button on the new-user signup page has been fixed. (STORM-3249) Cosmetic changes to the UI. (STORM-3476, 3473, 3495, 3467, 3461, 3438, 3416, 3414, 3410, 3407, 3403, 3395, 3357, 3298, 3297, 3277, 3409) Cosmetic changes to billing invoices. (STORM-3441, 3398, 3297) A bug hiding Storm banner messages has been fixed. (STORM-3490) Data storage graph y-axis and caption corrected. (STORM-3422, 3294) A bug breaking the "create project" workflow when an error occurs has been fixed. (STORM-3372) "Input methods" page in the "add data" workflow has been removed. (STORM-3412) 3/20/2012 All invited projects now listed under "Others" on Projects page (as opposed to being listed individually by project owner). (STORM-3375) Projects page no longer shows error message when a project hasn't received data in the past 24 hours. (STORM-3376) Project creation/upgrade page has more information about local taxes and the timing of a charge. (STORM-3380) Projects can now be as large as 1 TB. (STORM-3322) A user's Account page now shows dollar amounts to two decimal places, and does not show invoices for plans costing $0. (STORM-3252, STORM-3253) Assorted typos fixed and cosmetic improvements in UI. (STORM-3141, STORM-3328, STORM-3352, STORM-3042) Send activation email page has been simplified. (STORM-3367, STORM-2720) Billing emails updated. (STORM-3305, STORM-3021) A bug preventing a user from creating more than two projects has been fixed. (STORM-3300) The REST API input endpoint's SSL certification has been fixed. (STORM-3309)
14

REST API input endpoint was returning an extra error code (403) when Storm returned a 500; now returns only a 503. (STORM-3265) 3/6/2012 Billing is now integrated into Storm. While Storm is still in Beta, any invoice you receive should be for $0. A user can now input data to Storm using a public REST API endpoint, using basic authentication. Storm now sends alerts when a user is about to max out their plan. Storm now has two graphs to help users track the amount of data being indexed and how much space they have left in their plan. Storm has a new UI across all features, and many new or redesigned pages throughout the product. Users can file a case with Support. 2/8/2012 A user can now have up to four projects instead of two. Note that the number of allowed free projects (and the size of free projects) will change post-Beta. 1/24/2012 All users now have deletion policies, which by default store data infinitely. Check your data plan and usage and upgrade your plan (still free for Beta users) if you need it. Read about deleting data in "Delete data from a project" in this manual. A user can now upgrade (STORM-2616) or downgrade (STORM-2643, STORM-2617) a data plan (STORM-2550, STORM-2548, STORM-1948). A user can now choose the amount of time to store data (deletion policy; STORM-2527, STORM-1948, STORM-2677). Data will be deleted if it's older than the time you set in your retention policy. Read about deletion policies in "Choose a data storage plan" in this manual. Users can now see how much data they have sent in the past hour, day, or week. (STORM-2528, STORM-2002, STORM-2001, STORM-1973) Storm has changed the way it counts data. It now counts the uncompressed data, before it's indexed. The workflow for creating a dashboard from a saved search has been improved. (STORM-2597)

15

12/23/2011 Fixed link to support forum. (STORM-2610) In Splunk Web, fixed inconsistencies on the "Define report content > Format report" page. (STORM-2452) Fixed "Open" link on the "Dashboards" page. (STORM-2449) 12/22/2011 A bug in search that prevented keywords without wildcards from being found has been fixed. (STORM-2535) 12/20/2011 Now when a user enters the Storm UI, they are told which project they're in. (STORM-1552) A bug showing default event types in Manager has been fixed. (STORM-2433) 12/13/2011 Users can now send invites for Storm Beta. See "Share your project" in this manual for more information. 12/12/2011 A bug concatenating multiple syslog events into one event, present only with data sent over UDP, has been fixed. (STORM-2518) 12/5/2011 Storm has been upgraded to Splunk 4.3. (STORM-2276) Restart your browser and/or do a shift + reload in your browser for the UI changes to take effect. Storm backup process can now handle Amazon SimpleDB going down. (STORM-1278) A bug preventing Storm users from saving searches has been fixed. (STORM-2288) An Apache log event concatenation bug has been fixed. (STORM-2285) If external network conditions create a long transmission delay (e.g., due to packet loss), users might have seen a log line being split into two events. This has been made far less likely. (STORM-2247)

16

A bug creating unnecessary user status errors in splunkd.log has been fixed. (STORM-2377) The upload file progress bar now displays correctly in Chrome. (STORM-2244) A user invited to a Storm project now has correct permissions. (STORM-1392) Input proxy logging and connectivity has been improved. (STORM-2318) 9/22/2011 "Leave project" now works correctly. (STORM-1815, STORM-1814, STORM-1803) There is now filename validation when uploading a file. (STORM-1752) 9/14/2011 Display issues on the Data Inputs page have been fixed (STORM-1248, STORM-1246) Display issues on IE7 with custom source type box and form elements have been fixed. (STORM-1236, STORM-1237, STORM-1238) File upload form now defaults to the Project?s timezone. (STORM-1253) The correct error is now displayed when deleting a Project. (STORM-1501, STORM-1474)

17

Splunk concepts
Inputs and projects
The first step in using Splunk Storm is to send it data. Once Storm gets some data, it immediately indexes it and makes it immediately available for searching. Storm turns your data into a series of individual events, consisting of searchable fields. There's lots you can do to massage the data before and after Splunk indexes it, but you don't usually need to. In most cases, Storm can determine what type of data you're feeding it and handle it appropriately. Basically, you send Storm data and it does the rest. In moments, you can start searching the data and use it to create charts, reports, alerts, and other interesting outputs.

What's a Storm input?


Storm calls a single added data source an input. For example, a Storm input can be a file you upload once, or a stream of network data. Read more about adding data to Storm.

What's a Storm project?


Storm allows you to create projects to organize your data. You can have multiple projects, and multiple inputs per project. You can share a project with other users or keep it private. Learn more about projects.

About source types


In Splunk Storm, your data's source type, along with its host and source, is a default field created for every event that is indexed by Storm. These fields are defined as follows: host - An event's host value is typically the hostname, IP address, or fully qualified domain name of the network host from which the event originated. The host value enables you to easily locate data originating from a specific device.
18

source - The source of an event is the name of the file, stream, or other input from which the event originates. source type - The source type of an event is the format of the data input from which it originates, such as ruby_on_rails or log4j. Use a source type to categorize your data, and refer to those categories when creating searches, field extractions, tags, or event types. To change the source type for data you've already sent to Storm, you'll need to assign the data the corrected source type in your data input method and then resend the data. That is, you cannot edit source type (or any default field) in your data once Storm has already indexed it.

Predefined source types


The following source types are predefined in Splunk Storm for you to use when uploading a file or manually authorizing a network input. The Display name is the descriptive name in Storm. It's in the dropdown when you're uploading a file through the UI (in Projects > Inputs > Upload a file), for example. The Internal name is what you'll use when searching. It's also what you'll use when inputting data through the REST API. Expected syntax is what Storm expects your data to look like for a particular source type. Display name
Apache error logs

Internal name

Expected syntax

No

apache_error

Standard Apache web server er 12:17:35 2005] [error] [ does not exist: /home/reba/public_html/i

Apache web access_combined access logs

Standard NCSA combined form (generated by Apache or other w - webdev [08/Aug/2005:13 HTTP/1.0" 200 0442 "-" " (nagios-plugins 1.4)" Looks for a timestamp of format %d %b, %Y %H:%M:%S %p.

Catalina Java J2EE server logs CSV Data

catalina csv

For Java Catalina logs. Checks event for a timestamp. Breaks a Merges lines. comma-separated values

Email server logs with no mail_no_date date Facebook facebook

Looks for a timestamp of format %d %H:%M:%S Looks for a timestamp prefixed

Checks the first 15 characters o Breaks line only before a timesta Checks the first 500 characters Does not merge lines.

19

with created_time":" foursquare foursquare Looks for a timestamp prefixed with created": Expects each event to be a single line that begins with an ISO 8601 timestamp of the format %Y-%m-%dT%H:%M:%S.%3N (for

Checks the first 300 characters Does not merge lines. Does not

Generic single-line data

generic_single_line

example,
2011-08-10T04:20:55.432) Google+ IIS googleplus iis Looks for a timestamp prefixed with published":" Checks the first 800 characters Does not merge lines.

Works for all standard Microsoft

Breaks line only before an open JSON (pre-defined json_predefined_timestamp timestamp) Looks for a timestamp prefixed with timestamp":" of format %Y-%m-%dT%H:%M:%S.%3N

example :

{ "timestamp":"2013-10-24 "otherfield":"value" }

JSON (auto timestamp) JSON (no timestamp)

json_auto_timestamp

Tries to find a timestamp in the e an open curly brace, {

json_no_timestamp

Does not look for a timestamp in the time that Storm receives (an the project time zone. Breaks lin brace, {

Log4j

log4j

Works for Log4j standard output using log4j. Ex.: 2005-03-07 1 [PoolThread-0] INFO [STD property...

Log4php

log4php

Multi-line data (begins storm_multi_line with timestamp)

Expects multiline events that each begin with an ISO 8601 timestamp of the format %Y-%m-%dT%H:%M:%S.%3N (for Breaks line before a new timesta

example,
2011-08-10T04:20:55.432) Looks for a timestamp of format Time: %y%m%d %k:%M:%S

Mysql_slow

mysql_slow

For mysql slow query logs (mys only before a timestamp. Merge

MAX_EVENTS = 512
Ruby on Rails ruby_on_rails Expects events that each start with the string "Processing" and

20

contain a timestamp of the format %Y-%m-%d %H:%M:%S. Looks for a timestamp of format %b %d %H:%M:%S.

Syslog

syslog

Works for all data coming from s Checks the first 32 characters o Expects the host to be the word merge lines.

Syslog no host

syslog_nohost

Looks for a timestamp of format %b %d %H:%M:%S Looks for a timestamp of the format %Y-%m-%dT%H:%M:%S.%3NZ

Classic syslog format, except th the event; instead the host come server IP, forwarder, or inputs.co the data is sent).

Twitter

twitter

prefixed with
created_at":" custom source type

Checks the first 500 characters Does not merge lines.

Choose your own source type n before a timestamp. If it can't fin it as multiline data.

Learn more
To learn more about how source types work, read "Why source types matter" in the core Splunk product documentation. Note that the core documentation refers to some features that are not available in Splunk Storm.

21

Add value to your application: conscious logging


Why care about your logs?
Logs can be a hassle to deal with: they don't have a real structure, you might not know what's in them if you didn't create them yourself, and the sheer number of files and types of logs can be overwhelming. On the other hand, logs are really useful: they contain a gold mine of analytic information, they can help you find problems, and they can tell you about your IT infrastructure, the behavior of your users, and identify potential attackers. Chances are, you're already using logs for debugging purposes. But you can do more with your log files. Keep reading to learn how to get the most from your logs and how to set them up right. The results are worth it.

What you can learn from logs


Typically, IT organizations develop systems that focus on specific silos of technologies, functions, people, or departments. Individually, these silos provide a limited view. But they all generate data. Data that is nonstandard, unstructured, high-volume, and growing every day. Whatever you're logging, every environment has some set of IT data to leverage: Logs contain a record of activity for IT components, such as applications, servers, and network devices. Logs contain a record of customer activity and behavior, product and service usage, and transactions. Often, companies must write their own applications from scratch to perform statistical analysis on the data contained in these events. Similarly, companies build dashboards by hand to display that analysis for historical comparisons, future projections, performance tuning, billing, security, and so on. We can see from the above that there are two types of logging: Debug Logging As an application developer, you can use Storm to look at your own application logs for debugging rather than hunting and pecking through files.
22

System operators can use information from debugging logs to identify trends in system stability and performance. Semantic logging Semantic logging entails purposefully logging specific data that exposes the state of business processes (web clicks, financial trades, cell phone connections and dropped calls, audit trails, and so on). Strategic planners within your organization can leverage the power of "semantic" level events to create dashboards and presentations which reflect how the business is performing (as opposed to the application)

How should I create these logs?


Storm doesn't care about the format or schema of your data. Queries and searches can be ad hoc, and your data can come from any textual source. With Storm, you can use the logs generated from your application to achieve the same powerful analytics without building special-purpose software. Logging semantic data in addition to debugging data can significantly enrich virtually all applications. The following pseudo-code shows a purchase transaction:

public boolean submitPurchase(int purchaseId) { // Initialize "gold mine" logging fields to defaults String login = "unknown"; int cid = 0; float total = 0.0; try { // Debug logging entry point log.debug("action=submitPurchaseStart, purchaseId=%d", purchaseId); // Retrieve "gold mine" information for semantic logging later PurchaseInfo purchase = getPurchaseInfo(purchaseId); total = purchase.getTotal(); Customer customer = purchase.getCustomer(); cid = customer.getId(); login = customer.getLogin(); submitToCreditCard(purchaseId); generateInvoice(purchaseId); generateFullfillmentOrder(purchaseId);

23

// Semantic logging for revenue dashboards log.info("action=PurchaseCompleted, purchaseId=%d, customerId=%d, login=%s, total=%.2f", purchaseId, cid, login, total); // Debug logging exit point log.debug("action=submitPurchaseCompleted, purchaseId=%d", purchaseId); return true; } catch ( Exception ex ) { // Exception logging at the public interface level log.exception("action=submitPurchaseFailed, purchaseId=%d, error=%s", purchaseId, ex.getMessage()); // Semantic logging for failures dashboard log.info("action=PurchaseFailed, purchaseId=%d, customerId=%d, login=%s, total=%.2f, error=%s", purchaseId, cid, login, total, ex.getMessage()); return false; } } private void submitToCreditCard(int purchaseId) throws PurchaseException { try { // Debug logging entry point log.debug("action=submitToCreditCardStart, purchaseId=%d", purchaseId); // Submit transaction to credit card gateway // ... // Debug logging exit point log.debug("action=submitToCreditCardComplete, purchaseId=%d", purchaseId); } catch ( GatewayException ex ) { // Error logging at the private method level log.error("action=submitToCreditCardFailed, purchaseId=%d, error=%s", purchaseId, ex.getMessage()); throw new PurchaseException(ex.getMessage(), ex); } }

Here's a quick breakdown of the reasoning behind the patterns presented above:

24

For debugging 1. All methods log methodNameStart before doing anything. This gives a live stack trace in the logs as all sub operations are executed. 2. All methods log methodNameComplete before exiting in a successful state. 3. All methods log methodNameFailed before exiting from an expected failure state. 4. All private methods handle only the specific types of exceptions they expect to occur and log at error level (no catch alls, no stack traces are logged). 5. The top-level (public) entry point catches ALL types of exceptions and logs the stacktrace by using the log.exception() method. The debug logs contain live entry and exit logs for each level of the stack. This can be used for live profiling of the application. Execution times at every level of the stack can be graphed and used to find bottlenecks. (Since all your log lines contain timestamps don't they!) ONLY the public interface method logs at exception level. Therefore only one stack trace will be logged per error. Have you ever seen a log file with a stack trace logged twice? Logging the stack more than once makes it ambiguous as to how many errors actually occurred. System operators can use this data to: Find out how long purchases take during different times of the day and days of the week. Find out how long purchases are taking now compared to a previous period Find out how many purchases are failing, and graph these failures over time. Summarize error messages and group according to type to identify which systems are the most unstable. For business analysis 1. The public interface method (business transaction) logs a single log line after the purchase completes successfully or fails. 2. Semantic logging tries to include extra "gold mine" information about the specific purchase and the customer. These fields are initially set to default values and the real values are filled in as they are obtained. In this way, in the case of a failure the values are logged if they were retrieved but the defaults are
25

logged if the error occurred while retrieving them. Revenue total is logged for graphing. Customer login details are added for customer support troubleshooting or proactive follow up. Business operators can use this data to: Graph purchase volume over time (day, week, month, quarter, year). Group purchases by customer to find out who the most valuable customers are and reward them for their loyalty. Group purchases by customer to find out who hasn't come back for repeat purchases and contact them to find out how you could serve them better. Graph purchase revenue over time. Graph purchase failures over time. Find specific purchase failures. To help a customer who calls in about a failure. To proactively contact customers by other means to complete purchases that would otherwise have been lost to a competitor. Each of these items is a simple search in Storm that you can save, schedule, and share by email. And there is no need to architect a complex solution--the more data you log, the more you get out of it. It's really that simple. Use Storm instead of complex and inflexible architectures like RDBMS/SQL.

Conclusion
It is important to plan your logging in a way to get the maximum benefit from your application. Start by thinking about how other people will view your logs. Events might get separated from the logs. Certain events might get copied and passed along to a variety of people inside and outside of an IT organization. Without using robust logging techniques, you can lose meaningful data, making it harder to derive insight from the logs. The logging best practices that follow can help make your debugging logs more relevant and powerful for you and others.

About logging
Splunk Storm can index any text data. There are things you can do, however, to help Splunk extract more information, and provide more value, from your data. Getting developers to change the way they do things is like herding cats. But if you want to save a huge amount of time and get the most out of your logging,
26

heed these tips. If you scour the web, you can find many "logging best practices" documents. Most of them are language specific, discuss performance implications, and explain event levels like DEBUG, INFO, etc. The purpose of this document is not to duplicate existing efforts, but to show you how to log effectively so you use robust analytics in Splunk Storm. Storm can reveal a tremendous amount of actionable information from within your logs, especially if you follow these guidelines.

What (not) to log


Don't log binary information, as it cannot be meaningfully searched and analyzed. If you must log binary data, place textual meta-data in the event so that you can still search on this data. For example, don?t log the binary data of a JPG file, but do log the meta-data, such as the image size, creation tool, username, camera, gps location, etc. Try not to use XML if at all possible. First, it?s very difficult for humans to read and second, it requires more work for Splunk Storm to parse. Occasional XML is fine for dumping the value of something that exists in your code, but don?t make a habit out of it.

The right time


The correct time is critical to understanding the proper sequence of events. Most logging systems do this for you automatically, but if you are able to configure your log output, use whatever time-granularity makes sense to your application. Timestamps are critical for debugging, analytics, and deriving transactions. Storm will automatically timestamp events that don?t include them using a number of techniques, but it's best that you do it. Here are some examples of timestamps that Storm handles well by default: 2011-05-09 10:27:50,013 05-09-2011 00:00:00.789 05-09-2011 10:43:26.023 -0700 09/May/2011:10:43:07.040 -0700

Garbage in, garbage out


Put semantic meaning in events so you can get much more out of your data. Log audit trails, what users are doing, transactions, timing information ? anything that can add value when aggregated, charted, or further analyzed.
27

One of the most powerful features of Storm is its ability to extract fields from events at search time. This is how Storm creates structure out of unstructured event data. Events can be created in such a way to guarantee that the field extractions will work as intended. Simply use the following string syntax (spaces or commas are fine):
key=value,key2=value2,key3=value...

If your values contain spaces, wrap them in quotes (for example, username="bob smith"). This may be a bit more verbose than you are used to, but the automatic field extraction will be well worth the size difference. Take the two following events as an example: Log.debug("error user %d", userId) Log.debug("orderstatus=error user=%d", userId) Searching for the word "error" will probably bring up many types of events, but searching for "orderstatus=error" will retrieve the precise events of interest. Additionally, you can then query Splunk Storm for reports that use orderstatus, such as asking for the distribution of orderstatus (for example, completed=78%, aborted=21%, error=1%) ? something you couldn't do, if you only had the keyword "error" in your log event. Things like transaction IDs, user IDs, and so on are tremendously helpful when debugging, and even more helpful when gathering analytics. Unique IDs can point you to the exact transaction. Without them, many times all you have to go on is a time range. Don?t change the format of these ids between modules if you can help it. If you keep them the same, then a transaction can be tracked through the system, or multiple ones if that?s the case. You can track transactions with multiple ids as well, as long as there remains a transitive connection between them. For instance: Event A contains ID 12345 Event B contains ID 12345 and UID ABCDE Event C contains UID ABCDE You can associate A with B with C because there is closure between the two IDs in event B.

28

Learn more
For more information about writing log files that can easily produce intelligence with Storm, read the three topics about conscious logging in this manual.

Best practices
You can help Storm get more out of your logs by following these best practices.

Logging best practices


Keep these guidelines in mind when deciding how to output information to logs. Use clear key-value pairs One of the most powerful features of Storm is its ability to extract fields from events when you search, creating structure out of unstructured data. To make sure field extraction works as intended, use the following string syntax (using spaces and commas is fine):
key1=value1, key2=value2, key3=value3 . . .

If your values contain spaces, wrap them in quotes (for example, username="bob smith"). This might be a bit more verbose than you are used to, but the automatic field extraction is worth the size difference. Create human-readable events Avoid using complex encoding that would require lookups to make event information intelligible. Use human-readable timestamps for every event The correct time is critical to understanding the proper sequence of events. Timestamps are critical for debugging, analytics, and deriving transactions. Storm will automatically time stamp events that don't include them using a number of techniques, but it's best that you do it. Use the most verbose time granularity possible.
29

Put the time stamp at the beginning of the line--the farther you place a time stamp from the beginning, the more difficult it is to tell it's a time stamp and not other data. Include a time zone, preferably a GMT/UTC offset. Time should be rendered in microseconds in each event. The event could become detached from its original source file at some point, so having the most accurate data about an event is ideal. Use unique identifiers (IDs) Unique identifiers such as transaction IDs and user IDs are tremendously helpful when debugging, and even more helpful when you are gathering analytics. Unique IDs can point you to the exact transaction. Without them, you might only have a time range to use. When possible, carry these IDs through multiple touch points and avoid changing the format of these IDs between modules. That way, you can track transactions through the system and follow them across machines, networks, and services. You might also find it helpful to give this ID a consistent name (something more descriptive than "ID"). Log in text format Avoid logging binary information because Storm cannot meaningfully search or analyze binary data. Binary logs might seem preferable because they are compressed, but this data requires decoding and won't segment. If you must log binary data, place textual meta-data in the event so that you can still search through it. For example, don't log the binary data of a JPG file, but do log its image size, creation tool, username, camera, GPS location, and so on. Avoid using XML and JSON Avoid formats with multidepth nesting because they aren't human-readable and require more work to parse. Occasional XML is fine for dumping the value of something that exists in your code, but don't make a habit out of it. Log more than just debugging events Put semantic meaning in events to get more out of your data. Log audit trails, what users are doing, transactions, timing information, and so on. Log anything that can add value when aggregated, charted, or further analyzed. In other words, log anything that is interesting to the business.

30

Use categories For example, use DEBUG, INFO, WARN, ERROR, EXCEPTION. Define a specific case when each level should be used. e.g: DEBUG level for application debugging INFO level for symantic logging WARN level for recoverable errors or automatic retry situations ERROR level for errors that are reported but not handled. EXCEPTION level for errors that are safely handled by the system Keep multi-line events to a minimum Multiline events generate a lot of segments, which can affect indexing and search speed, as well as disk compression. Consider breaking multiline events into separate events.

Operational best practices


These operational best practices apply to the way you do logging: Log locally to files. If you log to a local file, it provides a local buffer and you aren't blocked if the network goes down. Use rotation policies. Logs can take up a lot of space. Maybe compliance regulations require you to keep years of archival storage, but you don't want to fill up your file system on your production machines. So, set up good rotation strategies and decide whether to destroy or back up your logs. Collect events from every single machine. The more data you capture, the more visibility you have. Collect these when you can: Application logs Database logs Network logs Configuration files Performance data (iostat, vmstat, ps, etc.) Anything that has a time component

Examples

31

Use clear key-value pairs


In this example of what not to do, the term "error" is too vague and the keys must be assigned to the values because they are not provided:
BAD: Log.debug("error %d 454 - %s", userId, transId)

In this improved version, the event is easier to parse because the key-value pairs are clearly provided. Searching on "orderstatus=error" will retrieve exactly the events you want. Also, you can query Splunk for reports that use orderstatus, such as requesting its distribution (e.g., completed=78%, aborted=21%, error=1%), which is something you couldn't do if you only had the keyword "error" in your log event.
GOOD: Log.debug("orderstatus=error, errorcode=454, user=%d, transactionid=%s", userId, transId)

Break up multivalue information


Parsing this multivalue event is difficult, and so is adding data for each value of app:
BAD: <TS> phonenumber=333-444-4444, app=angrybirds,facebook

This improved version breaks multivalue information into separate events, so the key-value pairs are more clear:
GOOD: <TS> phonenumber=333-444-4444, app=angrybirds, installdate=xx/xx/xx <TS> phonenumber=333-444-4444, app=facebook, installdate=yy/yy/yy

Use headings as keys


You can use headings as keys, as shown in the following example. Storm can interpret the column headers as keys and each line as values (although this does break the rule about avoiding multiline events):
<TS> USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND Root 41 21.9 1.7 3233968 143624 ?? Rs 7Jul11 48:09.67 /System/Library/foo

32

Dras 790 4.5 0.4 4924432 32324 ?? S 8Jul11 9:00.57 /System/Library/baz

Use multiple unique IDs to connect events


If you can't use one unique identifier, use a transitive connection from one event to another. For example, here are three separate events:
Event A: transid=abcdef Event B: transid=abcdef, otherid=qrstuv Event C: otherid=qrstuv

You can associate Event A with Events B and C, because of the connection between the two IDs in Event B.

33

Storm Tutorial
Welcome to the Storm tutorial
This tutorial walks you through some of the main functionality of the Splunk Storm interface, and gets you started with searching and creating reports. This tutorial uses the same sample dataset and general use cases as the Splunk user tutorial in the Splunk core product documentation, but has been adapted for the Storm interface and feature set.

Create a Storm account and log in


If you haven't already followed the instructions in "Create and activate an account," do so before proceeding. Once you're logged in, you'll create a project to use for the tutorial.

Create the tutorial project


This topic assumes you've created an account in Splunk Storm and logged in to it. If you haven't yet done this, follow the steps in "Create and activate an account" before proceeding.

The tutorial project


We'll first create a project for the tutorial data to live in. You can use an existing project if you like, but if there's already data in that project, what you see when you run the searches in this tutorial might differ slightly from the screen shots. If you can create a new project for the tutorial (which you can delete later), that is simpler. To create a new project 1. Click Add project. The Add project panel is displayed. 2. Give your project a name. The screen shots in this tutorial will show that our project is named "tutorial," but you can call yours whatever you like.
34

3. Specify the time zone for your project. The time zone you set here is applied to all the data that you send to this project. 4. Click Continue. The "Choose a plan" panel is displayed. 5. For the purposes of the tutorial, choose the Free plan, which is preselected. Refer to "Choose a data storage plan" for more information about data plans and how to choose them. 6. Click Continue. The Confirmation panel is displayed. 7. Ensure the details are correct and click Confirm. After a moment, the project data page for your project is displayed, with the Inputs tab selected. Next, we'll download the tutorial data set and upload it to your project.

Add data to the tutorial project


This topic assumes you've already logged in to Splunk Storm and created a project for your tutorial.

About the data set


This tutorial uses the same sample data set as the core Splunk tutorial. This data set contains example logs from an online store, the Flower & Gift shop. The sample data includes: Apache Web server logs MySQL database logs

Download and uncompress the sample data file


1. Download the sample data from here: sampledata.zip. This sample data file is updated daily, so be sure to download the data set on the day you plan to run through the tutorial. 2. Create a folder in a location on your local machine that you can find again and uncompress the zip file into it. You'll see three folders each with an Apache "access_combined" log file inside and one Mysql folder with a MySQL log file inside it.

35

Add the data to your project


Next, you'll upload each of the logfiles to Storm. 1. Click Inputs if you're not already there. Perform these steps for each of the separate log files: 2. Click File upload. 3. Under "Upload a file to the project," click +Upload. 4. Next to "File," click Browse. The file browser is displayed. 5. Browse to the access_combined.log file in apache1.splunk.com and choose it. 6. For each of the log files, choose a source type. For the Apache access logs, choose Apache web access logs. For the MySQL log file, choose Generic single-line data. Specifying a source type tells Storm how to parse your data, and allows you to group all the data of a certain type together when searching. When you add your own data to Storm, you'll want to specify the right source type so that Storm extracts timestamps and linebreaks your data correctly. Read "Sources and source types" in this manual for more information about your choices. 7. For the tutorial, leave the default value for Time zone (which should be the time zone you specified when you created your project). Refer to "How does the time zone affect things?" for more information.

36

8. Click Upload. 9. Repeat steps 3-8 for each of the files. There's one in each of the three Apache folders, and one in the MySQL folder. Next, we'll check out what the Storm user interface looks like once you've added some data.

Introduction to the Storm UI


Now that you've uploaded the three Apache access logs and the MySQL log from the sample data set to your Storm project, it's time to explore the data in the Storm UI. This topic assumes you've just added the sample data for the online Flower & Gift shop. If you haven't, go back to "Add data to your project" to get it before proceeding. Once you have data in Storm, you're ready to start searching. This topic introduces you to the Home, which is Storm's default interface for searching and analyzing data. If you're already familiar with the Splunk search interface, you can skip ahead and start searching. You are a member of the Customer Support team for the online Flower & Gift shop. This is your first day on the job. You want to learn some more about the shop. Some questions you want answered are: What does the store sell? How much does each item cost?
37

How many people visited the site? How many bought something today? What is the most popular item that is purchased each day? Storm already has the data in it--let's take a look at it.

Click Explore data (it might take a few minutes for the button to appear). The Storm Home dashboard is displayed.

Storm Home
The Storm Home dashboard displays information about the data that you just uploaded to Storm and gives you the means to start searching this data.

What's in this dashboard? Storm includes many different dashboards and views. For now, you really only need to know about two of them: Home, where you are now. Search, where you will do most of your searching. Use the Search navigation menus to locate and access the different views in Storm. When you click on the links, Storm takes you to the respective dashboards or refreshes the page if you're already there. Other things in the Storm UI: Searches & Reports: lists all of your saved searches and reports. Search bar and Time range picker: enables you to type in your search and select different time ranges over which to retrieve events.
38

Sources panel: displays the top sources from the data in your Storm project. Sourcetypes panel: displays the top source typesin your Storm project. Hosts: displays the top hosts in your Storm project. If you're using a new, empty project for this tutorial, you'll only see the sample data files that you just uploaded. Because it's a one-time upload of a file, this data will not change. When you add more data, there will be more information on this dashboard. If you add data inputs that point to sources that are not static (such as log files that are being written to by applications), the numbers on the Home page will change as more data comes in from your source(s).

Kick off a search


1. Take a closer look at the bottom half of the Home dashboard.

In the Sources panel, you should see the Apache Web server logs for the online Flower & Gift shop data that you just uploaded. If you're familiar with Apache Web server logs, you might recognize the access_combined source type as one of the log formats associated with Web access logs. All the data for this source type should give you information about people who access the Flower & Gift shop website. Searching in Storm is very interactive. Although you have a search bar in the Home dashboard, you don't need to type anything into it just yet. Each of the sources, source types, and hosts listed in the Home dashboard is a link that will kick off a search when you click on them. 2. In the Sourcetypes panel, click access_combined. Splunk takes you to the Search dashboard, where it runs the search and shows you the results:

39

There are a lot of components to this view, so let's take a look at them before continuing to search.
Storm paused my search?

If you are searching in a Storm project that has more data on it than just this tutorial's sample data, your search might take a bit longer. If your search takes longer than 30 seconds, Storm will automatically pause it. If autopause pops up, click Resume search. You can read more about autopause in the core Splunk platform Knowledge Manager Manual. What's in this Search dashboard? The search bar and time range picker should be familiar to you -- it was also in the Home dashboard. But, now you also see a count of events, the timeline, the fields menu, and the list of retrieved events or search results. Search actions: Use these buttons to save a search, create a report or dashboard, export results, print, and more. Count of matching and scanned events: As the search runs, Storm displays two running counts of the events as it retrieves them: one is a matching event count and the other is the count of events scanned. When the search completes, the count that appears above the timeline displays the total number of matching events. The count that appears below the timeline and above the events list, tells you the number of events during the time range that you selected. As we'll see later, this number changes when you drill down into your investigations. Timeline of events: The timeline is a visual representation of the number of events that occur at each point in time. As the timeline updates with
40

your search results, you might notice clusters or patterns of bars. The height of each bar indicates the count of events. Peaks or valleys in the timeline can indicate spikes in activity or server downtime. Thus, the timeline is useful for highlighting patterns of events or investigating peaks and lows in event activity. The timeline options are located above the timeline. You can zoom in, zoom out, and change the scale of the chart. Fields sidebar: We mentioned before that when you index data, Storm by default automatically recognizes and extracts information from your data that is formatted as name and value pairs, which we call fields. When you run a search, Storm lists all of the fields it recognizes in the Fields menu next to your search results. You can select other fields to show in your events. Selected fields are fields that are set to be visible in your search results. By default, host, source, and source type are shown. Interesting fields are other fields that Storm has extracted from your search results. Field discovery is an on/off switch at the top of the Fields menu. Storm's default setting is Field discovery on. If you want to speed up your search, you can turn Field discovery off, and Storm will extract only the fields required to complete your search. Results area: The results area displays the events that Storm retrieves to match your search. It's located below the timeline. By default the events are displayed as a list, but you can also choose to view them as a table. When you select the event table view, you will only see the Selected fields in your table. When you're ready, proceed to the next topic to start searching and find out what's up at the flower shop.

Start searching
This topic walks you through simple searches using the Search interface. If you're not familiar with the search interface, go back to the introduction to the Storm UI before proceeding. It's your first day of work with the Customer Support team for the online Flower & Gift shop. You're just starting to dig into the Web access logs for the shop, when you receive a call from a customer who complains about trouble buying a gift for his girlfriend--he keeps hitting a server error when he tries to complete a purchase. He gives you his IP address, 10.2.1.44.

41

Typeahead for keywords


Everything in Storm is searchable. You don't have to be familiar with the information in your data because searching in Storm is free-form and as simple as typing keywords into the search bar and hitting Enter (or clicking that blue arrow at the end of the search bar). In the previous topic, you ran a search from the Summary dashboard by clicking on the Web access source type (access_combined). Use that same search to find this customer's recent access history at the online Flower & Gift shop. 1. Type the customer's IP address into the search bar:
sourcetype=access_combined 10.2.1.44

As you type into the search bar, Splunk's search assistant opens. Search assistant shows you typeahead, or contextual matches and completions for each keyword as you type it into the search bar. These contextual matches are based on what's in your data. The entries under matching terms update as you continue to type because the possible completions for your term changes as well. Search assistant also displays the number of matches for the search term. This number gives you an idea of how many search results Splunk will return. If a term or phrase doesn't exist in your data, you won't see it listed in search assistant. What else do you see in search assistant? For now, ignore everything on the right panel next to the contextual help. Search assistant has more uses once you start learning the search language, as you'll see later. And, if you don't want search assistant to open, click "turn off auto-open" and close the window using the double up-arrow below the search bar.

More keyword searches


2. If you didn't already, run the search for the IP address. (Hit Enter.) Storm retrieves the customer's access history for the online Flower & Gift shop.

42

Each time you run a search, Storm highlights in the search results what you typed into the search bar. 3. Skim through the search results. You should recognize words and phrases in the events that relate to the online shop (flower, product, purchase, etc.).

The customer mentioned that he was in the middle of purchasing a gift, so lets see what we find by searching for "purchase". 4. Type purchase into the search bar and run the search:
sourcetype=access_combined 10.2.1.44 purchase

When you search for keywords, your search is not case-sensitive and Storm retrieves the events that contain those keywords anywhere in the raw text of the event's data.

43

Among the results that Splunk retrieves are events that show each time the customer tried to buy something from the online store. Looks like he's been busy!

Use Boolean Operators


If you're familiar with Apache server logs, in this case the access_combined format, you'll notice that most of these events have an HTTP status of 200, or Successful. These events are not interesting for you right now, because the customer is reporting a problem.

5. Use the Boolean NOT operator to quickly remove all of these Successful page requests. Type in:
sourcetype=access_combined 10.2.1.44 purchase NOT 200

You notice that the customer is getting HTTP server (503) and client (404) errors.

But, he specifically mentioned a server error, so you want to quickly remove events that are irrelevant.
44

Storm supports the Boolean operators: AND, OR, and NOT. When you include Boolean expressions in your search, the operators have to be capitalized.

The AND operator is always implied between search terms. So the search in Step 5 is the same as:
sourcetype=access_combined AND 10.2.1.44 AND purchase NOT 200

Another way to add Boolean clauses quickly and interactively to your search is to use your search results. 6. Mouse over an instance of "404" in your search results and ALT-click it. This updates your search string with "NOT 404" and filters out all the events that contain the term.

From these results, you see each time that the customer attempted to complete a purchase and received the server error. Now that you have confirmed what the customer reported, you can continue to drill down to find the root cause.

Interactive searching

45

Storm lets you highlight and select any segment from within your search results to add, remove, and exclude them quickly and interactively using your keyboard and mouse: To add more search terms, highlight and click the word or phrase you want from your search results. To remove a term from your search, click a highlighted instance of that word or phrase in your search results. To exclude events from your search results, ALT-click on the term you don't want Storm to match. When you're ready to proceed, go to the next topic to learn how to investigate and troubleshoot interactively using the timeline in Storm.

Use the timeline


This topic assumes that you're comfortable running simple searches to retrieve events. If you're not sure, go back to the last topic where you searched with keywords, wildcards, and Booleans to pinpoint an error. Back at the Flower & Gift shop, let's continue with the customer (10.2.1.44) you were assisting. He reported an error while purchasing a gift for his girlfriend. You confirmed his error, and now you want to find the cause of it. Continue with the last search, which showed you the customer's failed purchase attempts. 1. Search for:
sourcetype=access_combined 10.2.1.44 purchase NOT 200 NOT 404

In the last topic, you really just focused on the search results listed in the events viewer area of this dashboard. Now, let's take a look at the timeline.

46

The location of each bar on the timeline corresponds to an instance when the events that match your search occurred. If there are no bars at a time period, no events were found then. 2. Mouse over one of the bars. A tooltip pops up and displays the number of events that Splunk found during the time span of that bar (1 bar = 1 hr).

The taller the bar, the more events occurred at that time. Often seeing spikes in the number of events or no events is a good indication that something has happened. 3. Click one of the bars, for example the tallest bar. This updates your search results to show you only the events at the time span. Splunk does not run the search when you click on the bar. Instead, it gives you a preview of the results zoomed-in at the time range. You can still select other bars at this point.

47

4. Double-click on the same bar. Splunk re-runs your search to retrieve only events during that one hour span you selected.

You should see the same search results in the Event viewer, but, notice that the search overrides the time range picker and it now shows "Custom time". (You'll see more of the time range picker later.) Also, each bar now represents one minute of time (1 bar = 1 min). One hour is still a wide time period to search, so let's narrow the search down more. 5. Double-click another bar. Once again, this updates your search to now retrieve events during that one minute span of time. Each bar represents the number of events for one second of time.

48

Now, you want to expand your search to see everything else, if anything happened during this minute. 6. Without changing the time range, replace your previous search in the search bar with:
*

Splunk supports using the asterisk (*) wildcard to search for "all" or to retrieve events based on parts of a keyword. Up to now, you've just searched for Web access logs. This search tells Splunk that you want to see everything that occurred at this time range:

This search returns events from all the logs on your server. You expect to see other user's Web activity--perhaps from different hosts. But instead you see a cluster of mySQL database errors. These errors were causing your customer's purchases to fail. Now, you can report this issue to someone in the IT Operations team.

What else can you do with the timeline?

To show all the results for the timeline again, click deselect above the timeline. To lock-in the selected span of events to your search, click zoom in.
49

To expand the timeline view to show more events, click zoom out. When you're ready, proceed to the next topic to learn about searching over different time ranges.

Change the time range


This topic assumes that you're familiar with running ad hoc searches and using the timeline. If you're not sure, review the previous topics on searching and using the timeline. This topic shows you how to narrow the scope of your investigative searching over any past time range. If you have some knowledge about when an event occurred, use it to target your search to that time period for faster results. It's your second day of work with the Customer Support team for the online Flower & Gift shop. You just got to your desk. Before you make yourself a cappuccino, you decide to run a quick search to see if there were any recent issues you should be aware of. 1. Return to the Search dashboard and type in the following search over all time:
error OR failed OR severe OR (sourcetype=access_* (404 OR 500 OR 503)) This search uses parentheses to group together expressions for more complicated searches. When evaluating Boolean expressions, Storm performs the operations within the innermost parentheses first, followed by the next pair out. When all operations within parentheses are completed, Storm evaluates OR clauses, then, AND or NOT clauses.

Also, this search uses the wildcarded shortcut, "access_*", to match the Web access logs. If you have different source types for your Apache server logs, such as access_common and access_combined, this will match them all. This searches for general errors in your event data over the course of the last week. Instead of matching just one type of log, this searches across all the logs in your index. It matches any occurrence of the words "error", "failed", or "severe" in your event data. Additionally, if the log is a Web access log, it looks for HTTP error codes, "404", "500", or "503".

50

This search returns a significant amount of errors. You're not interested in knowing what happened over All time, even if it's just the course of a week. You just got into work, so you want to know about more recent activity, such as overnight or the last hour. But, because of the limitations of this dataset, let's look at yesterday's errors. 2. Drop down the time range picker and change the time range to Other > Yesterday.

By default, Storm searches across all of your data; that is, the default time range for a search is across "All time". If you have a lot of data, searching on this time range when you're investigating an event that occurred 15 minutes ago, last night, or the previous week just means that Storm will take a long time to retrieve the results that you want to see.

51

3. Selecting a time range from this list automatically runs the search for you. If it doesn't, just hit Enter.

This search returns events for general errors across all your logs, not just Web access logs. (If your sample data file is more than a day old, you can still get these results by selecting Custom time and entering the last date for which you have data.) Scroll through the search results. There are more mySQL database errors and some 404 errors. You ask the intern to get you a cup of coffee while you contact the Web team about the 404 errors and the IT Operations team about the recurring server errors.

Storm also provides options for users to select to search a continuous stream of incoming events:

Real-time enables searching forward in time against a continuous stream of live incoming event data. Because the sample data is a one-time upload, running a real-time search will not give us any results right now. We will explore this option later. Read more about real-time searches and how to run them in "Search and report in real-time" in the core Splunk product documentation. Note that any references in the core Splunk product documentation to the CLI and configuration files are not relevant for Splunk Storm. For more information about your time range options, see Change the time range of your search" in the core Splunk product documentation. Up to now, you've run simple searches that matched the raw text in your events. You've only scratched the surface of what you can do in Storm. When you're ready to proceed, go on to the next topic to learn about fields and how to search
52

with fields.

Use fields to search


This topic assumes you know how to run simple searches and use the time range picker and timeline. If you're not sure, review the previous topics, beginning with Start searching. You can learn a lot about your data from just running ad hoc searches, using nothing more than keywords and the time range. But you can't take full advantage of Storm's more advanced searching and reporting features without understanding what fields are and how to use them. This part of the tutorial will familiarize you with: default fields and other fields that Storm automatically extracts using the fields menu and fields dialog to find helpful fields searching with fields Let's return to the happenings at the online Flower and Gift shop. It's your second day as a member of the Customer Support team. You spent the morning investigating some general issues and reporting the problems you found to other teams. You feel pretty good about what you've learned about the online shop and its customers, but you want to capture this and share it with your team. When you ask a coworker how you can do this, he recommends that you use fields.

What are fields? Fields are searchable name/value pairings in event data. All fields have names and can be searched with those names. Some examples of fields are clientip for IP addresses accessing your web server, _time for the timestamp of an event, and host for domain name of a server. Fields distinguish one event from another because not all events will have the same fields and field values. Fields enable you to write more tailored searches to retrieve the specific events that you want. Fields also enable you to take advantage of the search language, create charts, and build reports. Most fields in your data exist as name and value pairs where there is one single value to each field name. But, you'll also see fields that appear more than once
53

in an event and have a different value for each appearance. One of the more common examples of multivalue fields is email address fields. While the "From" field will contain only a single email address, the "To" and "Cc" fields may have one or more email addresses associated with them. For more information, read About fields in the Knowledge Manager manual in the core Splunk product documentation. Note that any references in the core Splunk product documentation to the CLI and configuration files are not relevant for Splunk Storm.

The fields menu and fields dialog


1. Go back to the Search dashboard and search for web access activity. Select Other > Yesterday from the time range picker:
sourcetype="access_*"

2. Scroll through the search results. If you're familiar with the access_combined format of Apache logs, you will recognize some of the information in each event, such as: IP addresses for the users accessing the website. URIs and URLs for the page request and referring page. HTTP status codes for each page request. Page request methods.

As Storm retrieves these events, the Fields menu updates with selected fields and interesting fields. These are the fields that Storm extracted from your data.

Default and automatically extracted fields


54

Storm extracts fields from event data twice. It extracts default and other indexed fields during event processing when that data is indexed. And it extracts a different set of fields at search time, when you run a search. Read more about "Index time versus search time" in the Managing Indexers and Clusters manual in the core Splunk product documentation. Note that any references in the core Splunk product documentation to the CLI and configuration files are not relevant for Splunk Storm. At index time, Storm automatically finds and extracts default fields for each event it processes. These fields include host, source, and sourcetype (which you should already be familiar with). For a complete list of the default fields, see "Use default fields" in the User manual in the core Splunk product documentation. Storm also extracts certain fields at search time--when you run a search. You'll see some examples of these searches later. For more information, read the "Overview of search-time field extractions" in the Knowledge Manager manual in the core Splunk product documentation. Note that any references in the core Splunk product documentation to the CLI and configuration files are not relevant for Splunk Storm. Notice that default fields host, source, and source type are selected fields and are included in your search results:

3. Scroll through interesting fields to see what else Storm extracted. You should recognize the field names that apply to the web access logs. For example, there's clientip, method, and status. These are not default fields; they have (most likely) been extracted at search time. 4. Click Edit in the fields menu.

55

The Fields dialog opens and displays all the fields that Storm extracted. Available fields are the fields that Storm identified from the events in your current search (some of these fields were listed under Other interesting fields). Selected fields are the fields you picked (from the available fields) to show in your search results (by default, host, source, and sourcetype are selected).

5. Scroll through the list of Available fields. You're already familiar with the fields that Storm extracted from the web access logs based on your search. You should also see other default fields that Storm defined--some of these fields are based on each event's timestamp (everything beginning with date_*), punctuation (punct), and location (index). But, you should also notice other extracted fields that are related to the online store. For example, there are action, category_id, and product_id. From conversations with your coworker, you may know that these fields are:

field name description


action category_id product_id what a user does at the online shop. the type of product a user is viewing or buying. the catalog number of the product the user is viewing or buying.

56

6. From the Available Fields list, select action, category_id, and product_id.

7. Click Save. When you return to the Search view, the fields you selected will be included in your search results if they exist in that particular event. Different events will have different fields.

The fields menu doesn't just show you what fields Storm has captured from your data. It also displays how many values exist for each of these fields. For the fields you just selected, there are 2 for action, 5 for category_id, and 9 for product_id. This doesn't mean that these are all the values that exist for each of the fields--these are just the values that Storm knows about from the results of your search. What are some of these values? 8. Under selected fields, click action for the action field.

57

This opens the interactive menu for the action field.

This window tells you that, in this set of search results, Storm found two values for action and they are purchase and update. Also, it tells you that the action field appears in 71% of your search results. This means that three-quarters of the web access events are related to the purchase of an item or an update (of the item quantity in the cart, perhaps). 9. Close this window and look at the other two fields you selected, category_id (what types of products the shop sells) and product_id (specific catalog names for products). Now you know a little bit more about the information in your data relating to the online Flower and Gift shop. Let's use these fields to see what people are buying. For example, the online shop sells a selection of flowers, gifts, plants, candy, and balloons.

Use fields to run more targeted searches


These next two examples illustrate the difference between searching with keywords and using fields. Example 1: Return to the search you ran to check for errors in your data. Select Other > Yesterday from the time range picker:
error OR failed OR severe OR (sourcetype=access_* (404 OR 500 OR 503))

58

Run this search again, but this time, use fields in your search.

To search for a particular field, just type the field name and value into the search bar: fieldname=fieldvalue

The HTTP error codes are values of the status field. Now your search looks like this:
error OR failed OR severe OR (sourcetype=access_* (status=404 OR status=500 OR status=503))

Notice the difference in the count of events between the two searches--because it's a more targeted search, the second search returns fewer events. When you run simple searches based on arbitrary keywords, Storm matches the raw text of your data. When you add fields to your search, Storm looks for events that have those specific field/value pairs.
59

Also, you were actually using fields all along! Each time you searched for sourcetype=access_*, you told Storm to only retrieve events from your web access logs and nothing else.

Example 2: Before you learned about the fields in your data, you might have run this search to see how many times flowers were purchased from the online shop:
sourcetype=access_* purchase flower*

As you typed in "flower", search assistant shows you both "flower" and "flowers' in the typeahead. Since you don't know which is the one you want, you use the wildcard to match both.

If you scroll through the (many) search results, you'll see that some of the events have action=update and category_id that have a value other than flowers. These are not events that you wanted! Run this search instead. Select Other > Yesterday from the time range picker:
sourcetype=access_* action=purchase category_id=flower*

For the second search, even though you still used the wildcarded word "flower*", there is only one value of category_id that it matches (FLOWERS). Notice the difference in the number of events that Storm retrieved for each search; the second search returns significantly fewer events. Searches with fields
60

are more targeted and retrieves more exact matches against your data. As you run more searches, you want to be able to save them and reuse them or share them with your teammates. When you're ready, proceed to the next topic to learn how to save your search and share it it with others.

Save a search
This topic assumes you're comfortable running searches with fields. If you're not, go back to the previous topic and review how to Use fields to search. This topic walks you through the basics of saving a search and how you can use that search again later. Back at the Flower & Gift shop, you just ran a search to see if there were any errors yesterday. This is a search you will run every morning. Rather than type it in manually every day, you decide to save this search. Example 1. Run the search for all errors seen yesterday:
error OR failed OR severe OR (sourcetype=access_* (status=404 OR status=500 OR status=503))

1. Click Save below the search bar.

2. Select Save search... from the drop down list. The Save search dialog box opens.

61

At a minimum, a saved search includes the search string and the time range associated with the search, as well as the name of the search.

3. Name the search Errors (Yesterday) 4. Leave the Search string as it is. 5. Leave the Time range as it is. 6. Leave Share set as it is. 7. Click Finish. Storm confirms that your search was saved:

8. Find your saved search in the Searches & Reports list:

62

Because the saved search's name contained the word "Error," Storm lists it in the saved search submenu for Errors. Right now you are the only one that is authorized to access this saved search. Since this is a search that others on your team may want to run, you can set it as a shared saved search that they can access. To do this, read more about saving searches and sharing search results in the core Splunk product documentation. Note that any references in the core Splunk product documentation to the CLI and configuration files are not relevant for Splunk Storm.

Manage searches and reports If you want to modify a search that you saved, use the Searches & Reports menu to select Manage Searches & Reports. This takes you the Storm Manager page for all the searches and reports you're allowed to access (if you're allowed to access them). From here you can select your search from the list. This take you to the searches edit window where you can then change or update the search string, description, time range, and schedule options. Save and share search results Saving the results of a search is different from saving the search itself. You do this when you want to be able to review the outcome of a particular run of a search at a later time. Read more about this in saving searches and sharing search results in the core Splunk product documentation. Note that any references in the core Splunk product documentation to the CLI and configuration files are not relevant for Splunk Storm. Now, you can save your searches after you run them. When you're ready, proceed to the next topic to learn more ways to search.

Use Splunk's search language


This topic assumes that you are familiar with running simple searches using keywords and field/value pairs. If you're not sure, go back and read "Use fields to search". Back at the online Flower & Gift shop Customer Support office, the searches you've run to this point have only retrieved matching events from your Storm
63

index. In a previous topic, when you ran this search for purchases of flowers:
sourcetype=access_* action=purchase category_id=flowers

The search results told you approximately how many flowers were bought. But, this doesn't help you answer questions, such as: What items were purchased most at the online shop? How many customers bought flowers? How many flowers did each customer buy? To answer these questions, you need to use Splunk Storm's search language, which includes an extensive library of commands, arguments, and functions that enables you to filter, modify, reorder, and group your search results. For this tutorial you'll only use a few of them.

More search assistant


Example 1. What items were purchased most at the online shop? 1. Return to the search dashboard and restrict your search to purchases over Yesterday:
sourcetype=access_* action=purchase

As you type in the search bar, search assistant opens with syntax and usage information for the search command (on the right side). If search assistant doesn't open, click the blue double arrow under the left side of the search bar.

You've seen before that search assistant displays typeahead for keywords that you type into the search bar. Search assistant shows you information the search command because each time you ran a search to now, you were using the search command--it's implied. It
64

also helps you construct your search string by suggesting other search commands you may want to use next (common next commands). 2. Type a pipe character into the search bar. This causes the search assistant to show you some common next commands.
The pipe indicates to Splunk that you want to take the results of the search to the left of the pipe and use that as the input to the command after the pipe. You can pass the results of one command into another command in a series, or pipeline, of search commands.

You want Storm to give you the most popular items bought at the online store--the top command looks promising.

Construct a search pipeline


3. Under common next commands, click top. Splunk appends the top command to your search string. According to search assistant's description and usage examples, the top command "displays the most common values of a field"--exactly what you wanted. You wanted to know what types of items were being bought at the online shop, not just flowers. 4. Complete your search with the category_id field:
sourcetype=access_* action=purchase | top category_id

and press Enter. This gives you a table of the top or most common values of category_id. By default, the top command returns ten values, but you only have five different types of items. So, you should see all five, sorted in descending order by the count of each type:
65

The top command also returns two new fields: count is the number of times each value of the field occurs, and percent is how large that count is compared to the total count. Read more about the top command in the Search reference manual in the core Splunk product documentation. Note that any references in the core Splunk product documentation to the CLI and configuration files are not relevant for Splunk Storm.

Drill down into search results


The last search returned a table that showed you what items the online shop sells and how many of those items were purchased. But, you want to know more about an individual item, for example, flowers. Example 2: How many flowers were bought? 1. Click the row in the result table for Flowers. This kicks off a new search. Storm updates your search, to include the filter for the field/value pair category=flowers, which was the row item you clicked in the result table from the search in Example 2.

66

drilldown actions enable you to delve deeper into the details of the information presented to you in the tables and charts that result from your search. Read more about drilldown actions in the User manual in the core Splunk product documentation. Note that any references in the core Splunk product documentation to the CLI and configuration files are not relevant for Splunk Storm.
Splunk's

The number of events returned tells you how many times flowers were purchased, but you want to know how many different customers bought the flowers.

Example 3: How many different customers purchased the flowers? 1. You're looking specifically for the purchase of flowers, so continue with the search from the previous example:
sourcetype=access_* action=purchase category_id=flowers

The customers who access the Flower & Gift shop are distinguished by their IP addresses, which are values of the clientip field. 2. Use the stats command and the distinct_count() or dc() function:
sourcetype=access_* action=purchase category_id=flowers | stats dc(clientip) You piped the search results into the the stats command and used the distinct_count() function to count the number of unique clientip values

that it

finds in those events. This returns a single value:


67

This tells you that there were approximately 300 different people who bought flowers from the online shop. Example: 4 In the last example, you calculated how many different customers bought flowers. How do you find the number of flowers that each customer bought? 1. Use the stats command:
sourcetype=access_* action=purchase category_id=flowers | stats count

The count() function returns a single value, the count of your events. (This should match your result from Example 2.) Now, break this count down to see how many flowers each customer bought. 2. Add a by clause to the stats command:
sourcetype=access_* action=purchase category_id=flowers | stats count BY clientip

This search gives you a table of the different customers (clientip) and the number of flowers purchased (count).

68

Reformat the search results


You might know what the header for this table represents, but anyone else wouldn't know at a glance. You want to show off your results to your boss and other members of your team. Let's reformat it a little: 3. First, let's rename the count field:
sourcetype=access_* action=purchase category_id=flowers | stats count AS "# Flowers Purchased" by clientip The syntax for the stats command enables you to rename the field inline using

an "AS" clause. If your new field name is a phrase, use double quotes. The syntax for the stats command doesn't allow field renaming in the "by" clause. 4. Use the rename command to change the clientip name:
sourcetype=access_* action=purchase category_id=flowers | stats count AS "# Flowers Purchased" by clientip | rename clientip AS Customer

This formats the table to rename the headers, clientip and count, with Customer and # Flowers purchased:

69

For more information about the stats command and its usage, arguments, and functions, see the stats command in the Search reference manual and the list of stats functions. For more information about the rename command, see the rename command in the Search reference manual in the core Splunk product documentation. Note that any references in the core Splunk product documentation to the CLI and configuration files are not relevant for Splunk Storm.

In this last search, you found how many flowers each customer to the online shop bought. But what if you were looking for the one customer who buys the most items on any given day? When you're ready, continue on to the next topic to learn another way to search, this time using subsearches.

Use a subsearch
The last topic introduced search commands, the search pipeline, and drilldown actions. If you're not familiar with them, review more ways to search. This topic walks you through another search example and shows you two approaches to getting the results that you want. Back at the Flower & Gift shop, your boss asks you to put together a report that shows the customer who bought the most items yesterday and what he or she bought.

Part 1: Break the search down. Let's see which customer accessed the online shop the most yesterday.

70

1. Use the top command and limit the search to Yesterday:


sourcetype=access_* action=purchase | top limit=1 clientip You limit the top command to return only one result for the clientip.

If you wanted to see more than one "top purchasing customer", you would change this limit value.

Now, use the clientip value to complete your search. 2. Use the stats command to count the VIP customer's purchases:
sourcetype=access_* action=purchase clientip=10.192.1.39 | stats count by clientip

This only returns the count of purchases for the clientip. You also want to know what he bought. 3. One way to do this is to use the values() function:
sourcetype=access_* action=purchase clientip=10.192.1.39 | stats count, values(product_id) by clientip

This adds a column to the table that lists what he bought by product ID.

71

The drawback to this approach is that you have to run two searches each time you want to build this table. The top purchaser is not likely to be the same person at any given time range.

Part 2: Let's use a subsearch instead.

A subsearch is a search with a search pipeline as an argument. Subsearches are contained in square brackets and evaluated first. The result of the subsearch is then used as an argument to the primary search. Read more about "How subsearches work" in the User manual in the core Splunk product documentation. Note that any references in the core Splunk product documentation to the CLI and configuration files are not relevant for Splunk Storm.

1. Use a subsearch to run the searches from Part 1 inline. Type or copy/paste in:
sourcetype=access_* action=purchase [search sourcetype=access_* action=purchase | top limit=1 clientip | table clientip] | stats count, values(product_id) by clientip Because the top command returns count and percent fields as well, you use the table command to keep only the clientip value.

These results should match the previous result, if you run it on the same time range. But, if you change the time range, you might see different results because the top purchasing customer will be different!

2. Reformat the results so that it's easier to read:


sourcetype=access_* action=purchase [search sourcetype=access_* action=purchase | top limit=1 clientip | table clientip] | stats count, values(product_id) as product_id by clientip | rename count AS "How much did he buy?", product_id AS "What did he buy?", clientip AS "VIP Customer"

72

For more information about the usage and syntax for the sort command, see the sort command in the Search Reference manual in the core Splunk product documentation. Note that any references in the core Splunk product documentation to the CLI and configuration files are not relevant for Splunk Storm. When you're ready continue on to the next topic to review more search examples.

More search examples


This topic walks you through more searches using what you learned from previous topics. The search reference manual These examples use only a handful of the search commands and functions available to you. For complete syntax and descriptions of usage of all the search commands, see the Search reference manual. The complete list of search commands The list of functions for the eval command The list of functions for the stats command All of these topics are located in the core Splunk product documentation. Note that any references in the core Splunk product documentation to the CLI and configuration files are not relevant for Splunk Storm. Back at the Flower & Gift shop, you're running searches to gather information to build a report for your boss about yesterday's purchase records: How many page views were requested?

73

How many page views were there compared to the number of purchases made? What was purchased and how much was made? How many purchase attempts failed?

Example 1
How many times did someone view a page on the website, yesterday? 1. Start with a search for all page views. Select the time range, Other > Yesterday:
sourcetype=access_* method=GET

Next you want to count the number of page views (characterized by the method field). 2. Use the stats command:
sourcetype=access_* method=GET | stats count AS Views Here, you use the stats command's count() function to count

the number of "GET" events in your Web access logs. This is the total number of events returned by the search, so it should match the count of retrieved events. This search essentially captures that count and saves it into a field that you can use.

74

Here, renaming the count field as Views isn't necessary, but you're going to use it again later and this helps to avoid confusion. 3. Save this search as Pageviews (Yesterday).

Example 2
From Example 1, you have the total number of views. How many visitors who viewed the site purchased an item? What is the percentage difference between views and purchases? 1. Start with the search from Example 1. Select the Other > Yesterday from the time range picker:
sourcetype=access_* method=GET | stats count AS views

2. Use stats to count the number of purchases (characterized by the action field):
sourcetype=access_* method=GET | stats count AS Views, count(eval(action="purchase")) AS Purchases You also use the count() function again, this time with an eval() function, to count the number of purchase actions and rename the field as Purchases.

Here, the renaming is required--the syntax for using an eval() function with the stats command requires that you rename the field.

Now you just need to calculate the percentage, using the total views and the purchases. 3. Use the eval command and pipe the results to rename:
sourcetype=access_* method=GET | stats count AS Views, count(eval(action="purchase")) as Purchases | eval percentage=round(100-(Purchases/Views*100)) | rename percentage AS "%

75

Difference" The eval command

enables you to evaluate an expression and save the result into a field. Here, you use the round() function to round the calculated percentage of Purchases to Views to the nearest integer.

5. Save your search as "% Difference Purchases/Views".

Example 3
In the previous examples you searched for successful purchases, but you also want to know the count of purchase attempts that failed! 1. Run the search for failed purchase attempts, selecting Yesterday from the time range picker:
sourcetype=access_* action=purchase status=503

(You should recognize this search from the Start searching topic, earlier in this tutorial.) This search returns the events list, so let's count the number of results. 2. Use the stats command:
sourcetype=access_* action=purchase status=503 | stats count

This returns a single value:

76

This means that there were no failed purchases yesterday! 3. Save this search as Failed purchases (Yesterday). Now you should be comfortable using the search language and search commands. When you're ready, proceed to the next topic to learn how to create reports.

Create reports
This topic builds on the searches that you ran and saved in the previous search examples and walks you through creating charts and building reports.

Storm can dynamically update generated charts as it gathers search results. When you initiate a search, you can start building your report before the search completes. You can use the fields menu to quickly build simple pre-defined reports or use the Report Builder, which lets

you define, generate, and fine-tune the format of your report, from the type of chart you want to create to the contents you want to display on this chart. To learn more about using the report builder to define basic report parameters, format charts, and export or print finished reports, see "Define reports with the Report Builder" in the core Splunk product documentation. Note that any references in the core Splunk product documentation to the CLI and configuration files are not relevant for Splunk Storm. Back at the Flower & Gift shop, you're still building your reports. The previous searches you ran returned either a single value (for example, a count of failed errors) or a table of results (a table of products that were purchased). Now, you want to also add some pretty charts to your reports for yesterday's activities: The count of products purchased over time The count of purchases and views for each product category

Chart of purchases and views for each product


In this example, you want to build a chart of the number of views and number of purchases for each type of product. Recall that you saved a similar search in the previous topic.

77

Let's modify it a little. 1. Run this search over the time range, Yesterday:
sourcetype=access_* method=GET | chart count AS views, count(eval(action="purchase")) AS purchases by category_id | rename views AS "Views", purchases AS "Purchases", category_id AS "Category" Here, you use the chart command instead of the stats command. The chart command enables you to create charts and specify the x-axis with the by clause.

2. Click Create > Report. Because you use the chart command and have already defined your report, this opens the Format report page of the Report Builder.

78

If you see something different in this window, for example a different chart type, it's probably because you're not looking at the default settings. You don't need to worry about this though.

If your search string includes reporting commands, you access the Report Builder by clicking Show report. Storm will jump you directly to the formatting stage of the report-building process, since your reporting commands have already defined the report. The referenced documentation is in the core Splunk product documentation. Note that any references in the core Splunk product documentation to the CLI and configuration files are not relevant for Splunk Storm.

You don't need to have a strong understanding of reporting commands to use the Report Builder, but if you do have this knowledge the range of things you can do with the Report builder is increased. 3. Under Formatting options: Leave the chart type set to column. Name the chart, Purchases and Views by Product Type.

79

Because you're using the chart command, you have to define the axes of the chart. 4. Under General, leave the settings as it is.

5. Under Format, click X-axis: Type in "Product type" for the X-axis title.

6. Under Format, click Y-axis: Type in "Count of events" for the y-axis title.

80

7. Click Apply.

Now you should see your chart of purchases and views formatted as a column chart with the types of products on the X-axis.

7. Click Save and select Save report... The Save report dialog window opens:

81

Name your report Purchases & Views (Yesterday). Click Finish >>.

There are alternate ways to access the Report builder:

Click Build report in the Actions dropdown menu after you initiate a new search or run a saved search. Click a field in the search results sidebar to bring up the interactive menu for that field. Depending on the type of field you've clicked, you'll see links to reports in the interactive menu such as average over time, maximum value over time, and minimum value over time (if you've selected a numerical field) or top values over time and top values overall (if you've selected a non-numerical field). Click on one of these links, and Splunk opens the Format report page of the Report Builder, where it generates the chart described by the link.

Access saved reports


After you save a report, go << back to Search. Storm lists all your saved reports in the Searches & Reports menu on the search dashboard:

Save and share reports


When you're happy with the report you've created, you have a number of options for saving it and sharing it with others. Read more about saving your reports in "Save reports and share them with others" in the core Splunk product documentation. Note that any references in the core Splunk product documentation to the CLI and configuration files are not relevant for Splunk Storm.

82

Build and share a dashboard


Before you proceed with this topic you should review "Create reports" in this tutorial, where you have already built and saved a few reports. This topic walks you through creating simple dashboards that use the same searches and reports that you saved in the previous topics. Back at the Flower & Gift Shop, your boss asks you to put together a dashboard to show metrics on the products sold at the shop. You also decide to build yourself a dashboard to help you find and troubleshoot any problems with the shop.

Flower and Gift Shop Products


The first dashboard will show metrics related to the day-to-day purchase of different products at the Flower & Gift shop.

1. To start, run the search for purchases and views over the time range Yesterday:
sourcetype=access_* method=GET | chart count AS views, count(eval(action="purchase")) AS purchases by category_id | rename views AS "Views", purchases AS "Purchases", category_id AS "Category"

2. Under the search bar, click Create > Dashboard panel.

This opens the Add to Dashboard dialog.

3. Name your search.

83

Click Next.

4. Name your new dashboard.

Click Next.

5. Name your new panel.

Click Finish. When your dashboard appears, it has one panel in it.

6. To start defining panels for it, toggle the "Edit" button to On.

84

From here you can add a new panel to your dashboard, edit your dashboard's XML, or edit permissions to share your dashboard. You can also edit properties of your panel: edit the search, edit the visualization, or delete the panel. Share the dashboard Select Edit permissions to expand or restrict the permissions for the dashboard. The dashboard can be: A private view available only to yourself. Read-only for all your project's members. Congratulations! You've completed the Storm tutorial. Now you can get started Storming your own castle: Create a new project for your data. Add your own data.

85

Get started
Create and activate an account
This topic discusses how to create and activate your Splunk Storm account. When you use Splunk Storm, you log in with a splunk.com user account. This account is the owner of any projects you create in Storm.

If you already have a splunk.com account


If you already use the core Splunk product, you probably already have a splunk.com user account. If you've forgotten your password, you can go to http://splunk.com, click login in the upper right corner, and use the password reminder link to get your password sent to you via email. Then, use that account info to log in to Splunk Storm at http://splunkstorm.com.

If you do not already have a splunk.com account


If you do not already have a splunk.com account, just sign up for a Splunk Storm account. 1. Navigate to http://splunkstorm.com and click Sign up. 2. Fill in the fields and click Sign up. 3. Review and accept the Terms of Service. An email is sent to the address you used to sign up. 4. When you've received the email, click the validation link in the email. This takes you to Splunk Storm. You can use the same account login at http://splunk.com.

Create a project
This topic discusses how to create a project in Splunk Storm.

86

Once you've activated your account in Splunk Storm, you can create your first project.

What's a project?
Splunk Storm allows you to create projects to organize your data. A project can contain just one input or several. One main reason to organize your data into projects is that you can choose to share a project with other users. If you've got some data you want to share and some you want to keep private, create two projects: one that's just for you, and one to share with other users.

Create your first project


1. Click Add project. 2. Give your project a name. (You can change its name later.) 3. Specify the time zone for your project. The time zone you set here is applied to all the data that you send to this project. Read about time zones below. 4. Click Continue. 5. Choose a data storage plan for your project. Storm displays the amount you'll be charged per month for your data storage plan. You can change this plan later if you find you need more or less storage. Read about data storage in "Choose a data storage plan." 6. Click Continue. 7. If you choose a paid plan (rather than a free one), Storm prompts you for your credit card and billing information. The credit card you provide will be used for this project as well as for any future projects you create. You will be charged taxes based on your billing address. Read about Storm billing in this manual. 8. Click Save. 9. Review the confirmation page. Correct anything you've entered incorrectly (click the Edit link for that section). Once you click Confirm you will be charged the first month's bill. After a moment, Storm displays your project's data page, with the Inputs tab selected for your project.
87

From here, you can read "About adding data" to define the data inputs for your project. Or you might want to change your project's deletion policy from its default of storing data indefinitely. Read about deletion policies in "Choose a data storage plan."

How does the time zone affect things?


Setting the time zone here tells Storm that all the data from this project comes from that time zone. This means that if you are in San Francisco and have a teammate in London, and you're working together on an incident that occurred in a particular time frame, you both see the data the same way, in the same time zone. Right now, you can only set one time zone for all the network data in a given project. This means that if you have network data from different time zones, you must put each time zone's data into its own project. You can, though, specify a time zone for individual files uploaded to your project, or for data that you are sending via a forwarder. Storm recognizes zoneinfo TZ IDs. Refer to the zoneinfo (TZ) database for all permissible TZ values. Time zone precedence If your data has a recognizable time stamp with a time zone, Storm respects that. If it has no time stamp, Storm timestamps the events with time the data is indexed, using the project time zone. If you want to specify a time zone, use the REST API or forwarder.

Choose a data storage plan


Splunk Storm allows you to scale your projects as you need to. Storm data storage plans charge you for the total amount of raw, uncompressed data you store in Storm.

Which plan should I choose?


Choose a plan that supports the amount of data you need to have available for searching at any given time given your deletion policy. Here's an example:

88

If you send 2 GB of data per day to Storm, and set the data deletion policy to 30 days (the shortest allowed deletion policy), you will store 60 GB of data in Storm. Your corresponding storage level (the smallest one that accommodates your storage needs) would be 100 GB. Storm provides two graphs to help you determine how much data storage you'll need. Find these graphs under your project's Storage tab and read about them in "How much data am I sending to Storm?"

What happens if I go over my data storage plan?


When the total amount of data you are storing in Storm exceeds the amount specified for your plan, Storm will stop indexing any new data you send. To start receiving data again, either upgrade your storage plan (under your project's Plan tab) or delete data. For information on deleting data, see "Delete data from a project" in this manual.

What's a deletion policy?


Your deletion policy sets the amount of time Storm keeps your data. Once your data is older than the time span you've set in your deletion policy, Storm deletes the old data. Once your data has been deleted, you can no longer access it in Storm. Important: Storm measures the age of your data based on the time stamps on the data, not from when the data arrives in Storm. So, for example, if your retention period is set to delete data older than 40 days, and you index data that is time stamped two months ago, Storm will immediately delete that data. Change your deletion policy in your project's Settings. The default deletion policy is for Storm to store your data indefinitely. Minimum deletion policy The minimum length of time you can specify for data to be kept in Storm is 30 days. This means that you can specify a minimum of 30 days in your deletion policy, and you cannot delete data manually that is less than 30 days old. The exception to this is when you manually delete the whole project.

89

How much data am I sending to Storm?


Storm provides two graphs to help you choose a storage plan. Find these graphs under your project's Storage tab. The first graph shows the amount of data that Storm receives from you. Storm measures the raw, uncompressed data you send it. This graph is based on the data's time of arrival and shows your data's volume over time.

In this example, we've uploaded files on two different days. The second graph shows total data stored. This graph uses the time stamp on the data, which is also what your deletion policy uses. This graph shows how storage is being consumed. Again, Storm counts the raw, uncompressed data you send it. It also shows the average data stored per day, week, or month (in two places: as an overlay and in one of the three panels below the graph). Change the time range over which the average storage usage is shown by clicking day, week, or month at the top right.

90

In this example, the project has some recent data and some from July, and the average weekly data storage is 2.5 MB.

About billing
What happens when I modify my plan?
When you modify your storage plan, the changes go into effect right away. If your new plan costs more than your old plan, we send you an invoice and charge your credit card for the upgrade the next morning (Pacific time) during our daily bill runs. If your new plan is less expensive, we issue you an account credit. Your account credit should appear on an invoice the next morning (Pacific time). We will use all available account credit on your next payment(s) before charging your credit card. You might want to change your deletion policy to make more room once you downgrade your storage plan. Read about deletion policies in "Choose a data storage plan" in this manual.

What happens if my credit card expires or is declined?

91

If your credit card expires Storm will notify you when the expiration date is approaching for the credit card you have on file. Make sure that your email address is correct and that you can receive notification emails from Storm. You might want to add splunk-storm@splunk.com to any of your email whitelists. Keep your payment information current to prevent a disruption in service (as described below). If your credit card is declined If Storm is unable to charge your credit card, we email you a notification. Storm retries your card three times over the next several days until the payment is successful. Storm also retries your card when you update your payment information. After 10 days of unsuccessful payment attempts, Storm emails you that all the projects you own are now "delinquent" and Storm is no longer receiving any new data you send to your projects. On the 15th day of unsuccessful payment attempts (that is, five days after Storm stops receiving new data), Storm deletes all your delinquent projects. Any data in the deleted projects will be gone. Make sure that you can receive emails from splunk-storm@splunk.com so that you can be notified if there's a problem with your credit card.

92

Add data
About adding data
You have four basic ways to get data into Splunk Storm, which you can read about on these individual pages: Add data using the Storm REST API (Beta). This is one of two Storm data input methods that can use SSL. About forwarding data to Storm. Forwarding data is the other SSL-enabled data input method. This method involves installing a small instance of Splunk (a forwarder) on your server. Using a forwarder is a more robust solution than sending network data. Send network data over a TCP or UDP port. This includes sending syslog, syslog-ng, rsyslog, Snare, netcat, or Heroku data. Upload a file. This is a one-time file upload.

When you're ready to add data, be sure to read "About source types" in this manual.

Send data (including syslog) over a TCP/UDP port


Splunk Storm allows you to send data to your project from a remote host, over the network. To do this, you specify the type of authorization you would like to use and then tell your remote host where to send the data. Your Storm project is automatically assigned a TCP/UDP address and port to which you can send data to. Use this method for *nix syslog or set up netcat and bind to a port and send data. TCP is the protocol underlying Storm's data distribution. We recommend TCP over UDP for sending data from any remote machine to your Storm project.

93

Authorize a remote data source


There are currently two ways to authorize a network data source: automatic authorization or manual authorization. Storm also supports secure authentication through the REST API and forwarders. Use automatic authorization Automatic authorization is the easiest method to send data to your Storm project from any machine. When you select automatic authorization, your Storm project will accept data from any IP that connects to it during a 15 minute window that starts when you enable it. Any IPs that connect during that time period remain authorized to send data to your project indefinitely. After 15 minutes, automatic authorization will turn itself off and will block all IP addresses from connecting to the port besides those that sent data to that port during the 15 minute period it was open. This ensures that other people cannot send data to your deployment. You can turn automatic authorization on again whenever you like to authorize new IPs. Automatic authorization is especially useful if you do not know the IP address of the host sending data, or if you are sending data from multiple IP addresses and you can't easily determine all of the IP addresses. Every data source added via auto authorization is assigned a source type of "syslog". You can modify this setting later if you want. Caution: Because automatic authorization opens your project to accept data from any host that sends to it during the 15 minute window, please be aware that in theory someone else could add data to your project during this time. One way this can happen is if another Storm user mistypes a hostname or port number. Because of this, Splunk recommends that you turn off automatic authorization when you are finished with it rather than letting the 15 minutes run out. To enable automatic authorization: Navigate to your project's Inputs > Network data page. Click Authorize your IP address. Highlight Automatically and click. The 15 minute countdown starts, and Storm shows you the unique address and port you've been assigned. Configure your data source(s) to send to the address and port on the screen. Reload the page. Once Storm receives data from a source, you'll see its IP listed.
94

Authorize an IP address manually You can specify an IP address to authorize if you know the IP of your data source and want to ensure that your Storm project receives data from only that IP. To add a remote data source manually: Navigate to your project's Inputs > Network data page. Click Authorize your IP address. Highlight Manually and click. Enter the IP address of your remote data source. If you're accessing Storm from that host, you can click What is my IP? and Storm will look it up for you. Specify a source type. If you don't specify a source type, Storm automatically assigns your data a source type of syslog. For more information about source type, refer to "Sources and source types" in this manual. Note that the "Data last received" column shows NA; this is a known issue.

Choices around sending network data


Storm accepts data on any authorized TCP or UDP port, but we recommend sending data to Splunk via syslog-ng or rsyslog using TCP rather than UDP. TCP is the protocol underlying Storm's data distribution and is the recommended method for sending data from any remote machine to your Storm project. UDP is generally undesirable as a transport because: It doesn't enforce delivery. It's not encrypted. There's no accounting for lost datagrams. Send data via syslog For help setting up your remote source, refer to "Set up syslog" , "Set up syslog-ng", or "Set up rsyslog" in this manual.

95

Set up syslog
This topic talks about how to configure syslog so that you can get your syslog data into Splunk.

What is syslog?
Syslog (syslogd) is a standard for fowarding log messages for a system, often over an IP network. The term "syslog" refers to the syslog protocol, which sends a small message to a syslog daemon (syslogd) using either TCP or UDP in clear text. Syslog can be used for managing computer systems and security auditing for servers and applications. It is supported by a wide variety of devices and platforms.

Important note
Because plain syslog supports only the UDP transport protocol, and not the more reliable TCP protocol, we recommend that customers use rsyslog or syslog-ng instead of plain syslog. If syslogd is your only option (as is the case with some router or network devices), first ensure that your version of syslog supports sending data to a custom port number (other than UDP port 514). If it doesn't, you'll need to use another method for getting data to Splunk. If it does support custom port numbers, read on...

1. Authorize your TCP/UDP data source


To learn how to do this, read "Send data (including syslog) over a TCP/UDP port".

2. Configure syslogd.conf to send data to your port


Edit your syslogd.conf file, usually found in /etc/syslogd.conf, and add the following line at the bottom of the file:
*.* logsX.splunkstorm.com:[PORT #]

Be sure you use the correct hostname and port from the input you created! After you've saved the configuration file, you'll need to restart syslog. A simple cross-platform way to do this is by getting a process list, then sending a HUP signal to the process ID:
96

sh-3.2# ps -ax |grep syslog 15 ?? 0:00.49 /usr/sbin/syslogd sh-3.2# kill -HUP 15

3. Test your syslog configuration


You should now be able to test sending events to Splunk by using the command line tool logger:
logger -t test "my little pony"

Then log into the Splunk search interface (by going to your project then clicking Explore data) and search for your events, starting with "my little pony".

Set up syslog-ng
This topic talks about how to set up syslog-ng so that you can get syslog-ng data into Splunk.

What is syslog-ng?
Syslog-ng is an open-source *nix implementation of the syslog logging standard. The original syslog protocol allows messages to be sorted based only on priority/facility pairs; syslog-ng adds the ability to filter based on message content using regular expressions. Most importantly, syslog-ng supports transport over TCP. Syslog-ng is available as a free download at Balabit's website and is included in many *nix distributions by default.

1. Authorize your TCP/UDP data source


To learn how to do this, read "Send data (including syslog) over a TCP/UDP port".

2. Configure syslog-ng to send data to Splunk


Edit your syslogd.conf file, usually found in /opt/syslog-ng/etc/syslogd-ng.conf and make sure you have a source called s_all:

97

source s_all { internal(); unix-stream("/dev/log"); file("/proc/kmsg" program_override("kernel: ")); };

Next configure the protocol and port, and put them in a destination entry, being sure to specify TCP.

destination d_splunk { tcp("logsX.splunkstorm.com" port(20000)); };

Be sure to replace the logsX.splunkstorm.com and port(20000) with the address and port that is shown under Data Sources on your Inputs page. Next tell syslog-ng to forward Splunk the s_all source to the d_splunk destination:

log { source(s_all); destination(d_splunk); };

Next, specify which files you want syslog-ng to monitor: Let's say you want to monitor the error.log and web.log files in the /var/log/myapp/ directory. Specify this in the source directive:

file("/var/log/myapp/error.log" follow_freq(1) flags(no-parse)); file("/var/log/myapp/web.log" follow_freq(1) flags(no-parse));

Your configuration should now look as follows:

source s_all { internal(); unix-stream("/dev/log"); file("/proc/kmsg" program_override("kernel: ")); file("/var/log/myapp/error.log" follow_freq(1) flags(no-parse)); file("/var/log/myapp/web.log" follow_freq(1) flags(no-parse)); }; destination d_splunk { tcp("logsX.splunkstorm.com" port(20000)); }; log {

98

source(s_all); destination(d_splunk); };

Now, restart syslog-ng:


/etc/init.d/syslog-ng restart

3. Test your syslog configuration


Test sending events to Splunk by using the command line tool logger:
logger "my little pony"

Log into the Splunk search interface (by going to your project then clicking Explore data). Search for your events, starting with "my little pony".

Set up rsyslog
This topic talks about how to set up rsyslog so that you can get rsyslog data into Splunk.

What is rsyslog?
Rsyslog is an open-source *nix implementation of the syslog protocol. It supports reliable syslog transport over TCP, local buffering, SSL/TLS/RELP, logging to databases, and email alerting.

1. Authorize your TCP/UDP input into your project


To learn how to do this, read "Send data (including syslog) over a TCP/UDP port".

2. Configure rsyslog to send data to Splunk


Edit your rsyslog.conf file (usually in /etc/):

$ModLoad imfile $InputFileName /var/log/nginx/error.log $InputFileTag nginx: $InputFileStateFile stat-nginx-error $InputFileSeverity error $InputRunFileMonitor

99

$InputFilePollingInterval 10 *.* @@logsX.splunkstorm.com:20000

Be sure to replace the logsX.splunkstorm.com and port of 20000 with the address and port that is shown under your project's Inputs > Network data page. This configuration will make rsyslog send all of your logs to Splunk. If you do not like this behavior, add this first line:

& ~

If you want to send data over UDP instead of TCP (although we do recommend TCP), the last line of your rsyslog.conf edit should be:

*.* @logsX.splunkstorm.com:[PORT #]

The InputFileTag line tells rsyslog what to add as the tag in the log records. The InputFileStateFile is the file that will keep track of how much of that file you have already sent in. Make this unique for each file that you are using.

3. Test your rsyslog configuration


You should now be able to test sending events to Splunk by using the command line tool logger:
logger -t test "my little pony"

Then log into the Splunk search interface (by going to your project then clicking Explore data) and search for your events, starting with "my little pony".

Set up Snare (for Windows)


This topic talks about how to set up Snare on Windows so you can get Windows Event Logs into Splunk.

What is Snare?
Snare for Windows is a service that interacts with the underlying Windows Event Log subsystem to facilitate remote, real-time transfer of event log information.
100

Snare for Windows is compatible with the following operating systems: Windows NT Windows XP Windows Server 2000 Windows Server 2003 If you want to capture Windows events, like those in your event logs, currently the Snare EventLog Agent is the easiest way to do this. You can download the free agent from Intersect Alliance. If you're running Windows Vista, Windows 8, or Windows Server 2008, be sure to download the "Snare for Windows Vista" package. (From: http://www.intersectalliance.com/projects/SnareWindows)

1. Authenticate your TCP/UDP data source


To learn how to do this, read "Send data (including syslog) over a TCP/UDP port".

2. Configure Snare to send data to Splunk


One you install the software, go to ?Network Configuration? section. Configure the ?Destination Snare Server Address? and "Destination Port" to point to your assigned Storm address and port, found in Data Sources on your Inputs page. Make sure you click the ?Enable Syslog Header? in the configuration to ensure all the events include a timestamp and host.

Then restart the Snare service in your Service Control manager to make sure the configuration is enabled.

101

You should now see windows eventlog events in Splunk. You can increase or decrease the logging levels by editing the "Objective Configuration" in Snare.

Send data via netcat


You can test that data sources are authorized against your Storm project by sending a file or tailing a directory via netcat, which is available on most Unix distributions by default and as a download for Windows. Example of sending a file with netcat:
cat foo.txt | nc [youraddress] [yourport]

Example of tailing log files in a directory with netcat :


tail -f /var/log/*.log | nc [youraddress] [yourport]

Note: Avoid tailing compressed files. This tailing example does not recurse to subdirectories. Be sure to replace "[youraddress]" and "[yourport]" with your own assigned values.

Send data from Heroku

102

You can easily send data from a Heroku application via Heroku's syslog drain feature. For general information on Heroku's logging functionality visit their documentation. 1. From your Heroku application directory, turn logging to DEBUG level to increase the amount of data that is logged by Heroku:
heroku config:add LOG_LEVEL=DEBUG

2. Go to your Project in your splunkstorm.com account, and under "Inputs", click Network data. Click Authorize then Automatically. Take note of (or copy) the IP address and port assigned to your project. 3. From the Heroku application directory, create a syslog drain to your Splunk Storm project:
heroku drains:add syslog://logs2.splunkstorm.com:99999

Note: Be sure to replace logs2.splunkstorm.com:99999 with your assigned IP address and port. 4. Generate an action on your application that will generate logs. Be sure to do this during the 15 minute auto authentication period, or you will have to repeat step 2. Refresh the Inputs tab page on your Splunk Storm project and you should see Heroku's IP addresses added to your "Authenticated IP Addresses" list (there may be 5 IP addresses or more). 5. Click Explore Data from your project to see your data. You data will be indexed as the "syslog" source type, which is how Heroku formats its logs. Note: If your logs stop flowing or you believe not everything is reaching Storm, try turning Storm's auto authentication on again to ensure that all IP addresses are authenticated. Heroku might occasionally change the servers your application is hosted on.

What if Heroku changes the IP address of my app?


Heroku occasionally relocates user deployments to new IP space. If this happens to the application from which you're sending data to Splunk Storm, the authentication will stop working (because your data is now coming from an unrecognized IP). You have a couple of options:

103

You can re-authenticate the new IP(s) as described in step 2 above. In this situation, you risk losing some data from your application until you notice it is no longer coming in to your Storm project. You can configure your application to log to syslog on an intermediate server and then configure a Splunk universal forwarder as described in the forwarding chapter of this manual. This way, syslog will always receive your data, and the Splunk forwarder will handle getting it into Storm.

Upload a file
You can add data to your Storm deployment by uploading text files of up to 100 MB in size each. Binary, any compressed file format (such as zip, tar, or gz), executable, audio/video, and image files are not supported. To upload a file to Storm: 1. From your project, click Inputs. 2. Click File upload. 3. Click +Upload. The Upload a file panel is displayed. 4. Browse to your file and choose it. 5. Give your file a source type. Specifying a source type tells Splunk how to parse your data, and allows you to group all the data of a certain type together when searching. You can choose a predefined source type from the list, or select custom source type. If you choose to specify a custom source type, give it a name. If you don't specify a source type, the data will be assigned a type of "syslog". Read about Storm source types in this manual. If you choose to specify a custom source type, Storm will linebreak multi-line events, find the timestamps in your data, and extract some default fields as described in the following topics from the core Splunk product documentation: How Splunk extracts timestamps Overview of default fields 6. Choose the time zone for the data. If this file was generated in a different time zone from your project's default time zone, specify it here.
104

7. Click Upload.

105

Send data with forwarders


About forwarding data to Storm
This topic gives you an overview of forwarding data to Splunk Storm. For step-by-step instructions on setting up forwarding for Storm, read "Set up a universal forwarder on Windows" or "Set up a universal forwarder on *nix" in this manual. For information about forwarders in core Splunk, read "About forwarding and receiving data" in core Splunk documentation.

About forwarding data


You can forward data from a Splunk instance into a Storm project. The Splunk instance that performs the forwarding is a small-footprint version of Splunk, called a forwarder. Forwarders represent a much more robust solution for data forwarding than raw network feeds, with their capabilities for: Tagging default fields (source, source type, and host) Configurable buffering Data compression SSL security Using any available network ports

The universal forwarder


The universal forwarder is a type of Splunk forwarder. A forwarder is software that is installed on a remote server. It monitors files and other data sources, and securely sends this data in real time to a Storm project. Use the universal forwarder to gather data from a variety of inputs and forward the data to a Splunk server for indexing and searching. The universal forwarder is a separate executable from Splunk. Separate instances of Splunk and universal forwarders can coexist on the same system.

106

Tips and tricks for forwarding data to Storm


For details on setting up forwarding to Storm, see the topics on installing a universal forwarder on Windows or *nix. Forward data to a specific Storm project. When you install a forwarder, you also need the project's credentials package. This file contains the authentication credentials for the Storm project and is linked to the user who downloaded it. As a result, a forwarder sends data only to one specific project. Have a permanent project member set up the forwarder. If the member whose credentials package was used to configure one or more forwarders leaves the project, those forwarders will fail to authenticate. You can work around this, but if you want to save yourself some hassle, don't have the temp or intern set up all your forwarders. Forward data to multiple Storm projects. More than one instance of the universal forwarder can coexist on one server. If you want to forward data from one server into two projects, for the time being you need to download the credentials package from each project and install them on separate forwarders according to the setup instructions in the next two topics. Forward data from multiple locations into your Storm project. You can download the forwarder and credentials package once, then copy them to any server you want to send data from and go through the installation steps on the second server. For help deploying on many servers, Contact Storm Support. Forward data using scripted inputs. You can install Python separately if you want to forward data to Storm using scripted inputs. Copy a configured forwarder. Each forwarder needs an unique GUID to authenticate. If you want to clone an existing forwarder, clean the files $SPLUNK_HOME/etc/system/local/server.conf, $SPLUNK_HOME/etc/instance.cfg and update the host name in $SPLUNK_HOME/etc/system/local/inputs.conf on the copy before the first start.

107

Set up a universal forwarder on Windows


This topic tells you how to install a universal forwarder on a Windows system to send data to Splunk Storm. For an introduction to forwarding data to Storm, read "About forwarding data to Storm" in this manual. To set up a forwarder on a *nix system, read "Set up a universal forwarder on *nix" in this manual. If you are sending data to a standard Splunk deployment (not Storm), go to "Distributed Splunk overview" in the core Splunk documentation.

Download and install the universal forwarder for Windows


1. In the Storm UI, navigate to the project you want to forward data into. Click Inputs, then Forwarders. 2. Download forwarder credentials for this project by clicking credentials package. This package contains the authentication credentials and configuration that allow sending data to this project only. Do not skip this step. Note: Do not share this information with anyone else. It contains your access token. 3. Follow the link to the universal forwarder downloads page and download the package of your choice. 4. Copy both files to the server that will send data to Storm. 5. Install the universal forwarder package in one of two ways. You can run
msiexec /i splunkforwarder-<version>-<build>-<architecture>-release.msi

at a command prompt. You can also double click on the forwarder MSI file in the Explorer window. This installation directory will from now on be denoted %SPLUNK_HOME%. Note: If you are using the MSI installer wizard, do not specify a destination server. If you did, clean the output.conf from
%SPLUNK_HOME%\etc\system\local\outputs.conf

108

6. Start the universal forwarder:


%SPLUNK_HOME%\bin\splunk start

7. Install the forwarder credentials. From the directory containing the credentials package (using the default forwarder password), type:
%SPLUNK_HOME%\bin\splunk install app stormforwarder_<project_id>.spl -auth admin:changeme.

This command copies the file to %SPLUNK_HOME%\etc\apps and uncompresses it. 8. Log into the forwarder using the default credentials, admin/changeme:
%SPLUNK_HOME%\bin\splunk login -auth admin:changeme

9. (Optional) Change the default admin password:


%SPLUNK_HOME%\bin\splunk edit user admin -password foo

10. Add files or a directory for the forwarder to monitor, using either the CLI or configuration files. 11. Restart the forwarder so the changes take effect:
%SPLUNK_HOME%\bin\splunk restart

Add data using the CLI


For a complete list of available forwarder CLI commands, read "CLI commands for input". Example: Monitor the Windows Update log (where Windows logs automatic updates). Add C:\Windows\windowsupdate.log as a data input. From the forwarder's splunk\bin directory, type

splunk.exe add monitor C:\Windows\windowsupdate.log

109

Add data using configuration files


Open the file %SPLUNK_HOME%\etc\apps\search\local\inputs.conf, or create the file if it doesn't currently exist. Add or edit entries to inputs.conf. For help with this step, see "Edit inputs.conf".

More advanced setup procedures


Change your authentication credentials when your access token changes You need to do this only if you have generated a new access token through the Storm UI. Edit the file
%SPLUNK_HOME%\etc\apps\stormforwarder_<project_id>\local\inputs.conf Change the value of the key _storm_api_token

Change which project to send data to Edit the file


%SPLUNK_HOME%\etc\apps\stormforwarder_<project_id>\local\inputs.conf

Change the value of the key _storm_project_id to the ID of the project you would like to send to. Find the project ID by logging into your Storm account, navigating to the project you want to send data to, and then to Inputs > API. Note: Your access token must have Admin permission for the specified project. See "Share your project" for information about project permission. Change the default time zone of the data Edit the file
%SPLUNK_HOME%\etc\apps\stormforwarder_<project_id>\local\inputs.conf.

Change the value of the key _tzhint to the ID of the project you would like to send to (and uncomment it). Find the project ID by logging into your Storm account, navigating to the project you want to send data to, and then to Inputs > API. Note: By default, this time zone is set to value of your project's time zone at the time the credentials package was downloaded.

110

Remove the default network throughput limit Forwarders have a default limit of 256KBps. If you plan to forward a large volume of data, you can increase the limit or remove it. To do this: 1. Create or edit the file limits.conf in an app, example %SPLUNK_HOME%/etc/apps/search/local/limits.conf. 2. Copy the new settings in it :
[thruput] maxKBps = 0 # zero means unlimited # default was 256

3. Restart the forwarder to apply the change.

Uninstall the universal forwarder


To uninstall the universal forwarder, use the Add or Remove Programs option in the Control Panel.

Set up a universal forwarder on *nix


This topic tells you how to install a universal forwarder on Linux, Unix, or Mac OS X to send data to Splunk Storm. For an introduction to forwarding data to Storm, read "About forwarding data to Storm" in this manual. To set up a forwarder on a Windows system, read "Set up a universal forwarder on Windows". If you are sending data to a standard Splunk deployment (not Storm), go to "Distributed Splunk overview" in the core Splunk documentation.

Download and install the universal forwarder for *nix


In eleven easy steps! Get the forwarder and credentials package 1. In the Storm UI, navigate to the project you want to forward data into. Click Inputs, then Forwarders.
111

2. Download forwarder credentials for this project by clicking credentials package. This package contains the authentication credentials and configuration that allow sending data to this project only. Do not skip this step. Note: Do not share this information with anyone else, as it contains your access token. 3. Follow the link to the universal forwarder downloads page and download the package of your choice. Install the forwarder 4. Copy both files to the server that will send data to Storm. 5. Install the universal forwarder package by either running the installer or decompressing the file. For example, if installing the universal forwarder onto Red Hat Linux, use this command to install the package by default in /opt/splunkforwarder
rpm -i splunkforwarder_<package_name>.rpm

This installation directory will from now on be denoted $SPLUNK_HOME. 6. Start the universal forwarder.
$SPLUNK_HOME/bin/splunk start

7. Install the forwarder credentials (using the default forwarder user/password)


$SPLUNK_HOME/bin/splunk install app <path>/stormforwarder_<project_id>.spl -auth admin:changeme

This command copies the file to $SPLUNK_HOME/etc/apps and uncompresses it. Configure the forwarder inputs 8. Log into the forwarder using the default credentials "admin/changeme".
$SPLUNK_HOME/bin/splunk login -auth admin:changeme

9. (Optional) Change the default admin password.


$SPLUNK_HOME/bin/splunk edit user admin -password foo

112

10. Add files or a directory for the forwarder to monitor, using either the CLI or configuration files as discussed below. 11. Restart the forwarder so the changes take effect:
$SPLUNK_HOME/bin/splunk restart

Add data using the command line interface (CLI)


1. Run the following command to monitor files or directories.
$SPLUNK_HOME/bin/splunk add monitor <path to file>

Within seconds of running this command an entry will appear on Storm's Inputs > Forwarders page. 2. Run the following command to see a list of all monitored files. The files or directories that you've added should appear in the results of this command.
$SPLUNK_HOME/bin/splunk list monitor

For information on the CLI commands, read "CLI commands for input". Examples: Add data using the CLI Example 1: Monitor sends data to Storm as it is added to the file (real time).
$SPLUNK_HOME/bin/splunk add monitor /var/log -sourcetype syslog

Example 2: Oneshot is a one-time upload.


$SPLUNK_HOME/bin/splunk add oneshot /var/log/applog

Add data using configuration files


Open the file $SPLUNK_HOME/etc/apps/search/local/inputs.conf, or create it if it doesn't currently exist. Add or edit entries to inputs.conf. For help with this step, see "Edit inputs.conf".

113

More advanced setup procedures


Change your authentication credentials when your access token changes If you have generated a new access token through the Storm UI, you must do the following: Edit the file
$SPLUNK_HOME/etc/apps/stormforwarder_<project_id>/default/inputs.conf. Change the value of the key _storm_api_token.

Change which project to send data to Edit the file


$SPLUNK_HOME/etc/apps/stormforwarder_<project_id>/default/inputs.conf.

Change the value of the key _storm_project_id to the ID of the project you would like to send to. Find the project ID by logging into your Storm account, navigating to the project you want to send data to, and then to Inputs > API. Note: Your access token must have Admin permission for the specified project. See "Share your project" for information about project permission. Change the default time zone of the data Edit the file
$SPLUNK_HOME/etc/apps/stormforwarder_<project_id>/local/inputs.conf.

Change the value of the key _tzhint to the ID of the project you would like to send to (and uncomment it). Note: By default, this time zone is set to value of your project's time zone at the time the credentials package was downloaded. Remove the default network throughput limit Forwarders have a default limit of 256KBps. To determine if you are hitting this limit, look for events in splunkd.log like
8-21-2012 10:10:40.563 -0400 INFO BatchReader - Could not send data to output queue (parsingQueue), retrying..

If you plan to forward a large volume of data, you can increase the limit or remove it. To do this:
114

1. Create or edit the file limits.conf in an app's directory structure, for example in $SPLUNK_HOME$/etc/apps/search/local/limits.conf. 2. Copy the new settings into the limits.conf:
[thruput] maxKBps = 0 # zero means unlimited # default was 256

3. Restart the forwarder to apply the change.

Edit inputs.conf
This topic tells you about editing inputs.conf on a universal forwarder sending data to a Splunk Storm project. For core Splunk's forwarder documentation, see the Distributed Deployment Manual in core Splunk documentation. To add an input, add a stanza to inputs.conf in $SPLUNK_HOME/etc/apps/. If you have not worked with Splunk's configuration files before, read "About configuration files" in the core Splunk documentation before you begin. You can set multiple attributes in an input stanza. If you do not specify a value for an attribute, the forwarder uses the default that is predefined in $SPLUNK_HOME/etc/system/default/. Note: To ensure that new events are indexed when you copy over an existing file with new contents, set CHECK_METHOD = modtime in props.conf for the source. This checks the modtime of the file and re-indexes it when it changes. Be aware that the entire file will be re-indexed, which can result in duplicate events.

Configuration settings
The following are attributes that you can use in the monitor input stanzas. See the sections that follow for attributes that are specific to each type of input.
host = <string>

Sets the host key/field to a static value for this stanza. Sets the host key's initial value. The key is used during parsing/indexing, in particular to set the host field. It is also the host field used at search time.
115

The <string> is prepended with "host::". If not set explicitly, this defaults to the IP address or fully qualified domain name of the host where the data originated.
sourcetype = <string>

Sets the sourcetype key/field for events from this input. Explicitly declares the source type for this data, as opposed to allowing it to be determined automatically. This is important both for searchability and for applying the relevant formatting for this type of data during parsing and indexing. Sets the sourcetype key's initial value. The key is used during parsing/indexing, in particular to set the source type field during indexing. It is also the source type field used at search time. The <string> is prepended with "sourcetype::". If not set explicitly, Splunk picks a source type based on various aspects of the data. There is no hard-coded default. For more information about source types, see "source type". Specifying a time zone Forwarding data using a universal forwarder allows you to specify a time zone value that will be applied to all data coming from that forwarder. To specify a time zone setting for this forward, add a value (such as "US/Pacific") for the _tzhint attribute under the [default] stanza. This setting will be applied to all the inputs defined on this forwarder and will override any setting defined for the project to which the data is sent. For more information about setting a time zone for other types of data inputs, refer to "How does the time zone affect things?" in this manual.

Syntax and examples for "monitor" inputs


Stanzas in inputs.conf that start with monitor tell Splunk to watch all files in the <path> you have specified (or just the <path> itself if it represents a single file). You must specify the input type and then the path, so put three slashes in your path if you're starting at root. You can use wildcards for the path. For more information, read how to "Specify input paths with wildcards" in the core Splunk documentation.

[monitor://<path>] <attrbute1> = <val1> <attrbute2> = <val2>

116

...

The following are additional attributes you can use when defining monitor input stanzas:
source = <string>

Sets the source key/field for events from this input. Note: Overriding the source key is generally not recommended. Typically, the input layer will provide a more accurate string to aid in problem analysis and investigation, accurately recording the file from which the data was retreived. Consider use of source types, tagging, and search wildcards before overriding this value. The <string> is prepended with "source::". Defaults to the input file path.
crcSalt = <string>

Use this setting to force Splunk to consume files that have matching CRCs (cyclic redundancy checks). (Splunk only performs CRC checks against the first few lines of a file. This behavior prevents Splunk from indexing the same file twice, even though you may have renamed it -- as, for example, with rolling log files. However, because the CRC is based on only the first few lines of the file, it is possible for legitimately different files to have matching CRCs, particularly if they have identical headers.) If set, string is added to the CRC. If set to <SOURCE>, the full source path is added to the CRC. This ensures that each file being monitored has a unique CRC. Be cautious about using this attribute with rolling log files; it could lead to the log file being re-indexed after it has rolled. Note: This setting is case sensitive.
ignoreOlderThan = <time window>

Causes the monitored input to stop checking files for updates if their modtime has passed the <time window> threshold. This improves the speed of file tracking operations when monitoring directory hierarchies with large numbers of historical files (for example, when active log files are co-located with old files that are no longer being written to). Note: A file whose modtime falls outside <time window> when monitored for the first time will not get indexed. Value must be: <number><unit>. For example, "7d" indicates one week. Valid units are "d" (days), "m" (minutes), and "s" (seconds). Defaults to 0 (disabled).
117

followTail = 0|1

If set to 1, monitoring begins at the end of the file (like tail -f). This only applies to files the first time they are picked up. After that, Splunk's internal file position records keep track of the file. Defaults to 0.
whitelist = <regular expression>

If set, files from this path are monitored only if they match the specified regex.
blacklist = <regular expression>

If set, files from this path are NOT monitored if they match the specified regex.
alwaysOpenFile = 0 | 1

If set to 1, Splunk opens a file to check if it has already been indexed. Only useful for files that don't update modtime. Should only be used for monitoring files on Windows, and mostly for IIS logs. Note: This flag should only be used as a last resort, as it increases load and slows down indexing.
time_before_close = <integer>

Modtime delta required before Splunk can close a file on EOF. Tells the system not to close files that have been updated in past <integer> seconds. Defaults to 3.
recursive = true|false

If set to false, Splunk will not go into subdirectories found within a monitored directory. Defaults to true.
followSymlink

If false, Splunk will ignore symbolic links found within a monitored directory. Defaults to true.
118

Example 1. To load anything in /apache/foo/logs or /apache/bar/logs, etc.

[monitor:///apache/.../logs]

Example 2. To load anything in /apache/ that ends in .log.

[monitor:///apache/*.log]

CLI commands for input


This topic tells you how to monitor files and directories via the universal forwarder's Command Line Interface (CLI). To use these CLI commands, navigate to the $SPLUNK_HOME/bin/ directory and use the ./splunk command from the UNIX or Windows command prompt. If you get stuck, the Splunk forwarder CLI has built-in help. Access the main CLI help by typing splunk help. Individual commands have their own help pages as well -- type ./splunk help <command>.

CLI commands for input configuration


The following commands are available for input configuration via the CLI: Command Command syntax
add monitor <source>

Action

add monitor [-parameter value] ... Monitor inputs from <source>. edit monitor [-parameter value] ... <source>. remove monitor
remove monitor <source> edit monitor <source> Edit a previously added monitor input for Remove a previously added monitor input for <source>. List the currently configured monitor inputs. Copy the file <source> directly into Splunk. This uploads the file once, but Splunk does not continue to monitor it.

list monitor list monitor add oneshot


add oneshot <source> [-parameter value] ...

Change the configuration of each data input type by setting additional parameters. Parameters are set via the syntax: -parameter value. Note: You can only set one -hostname, -hostregex or -hostsegmentnum per command.
119

Parameter Required?

Description
Path to the file or directory to monitor/upload for new input.

<source>

Yes

Note: Unlike the other parameters, the syntax for this parameter is just the value itself and is not preceded by a parameter flag: "<source>", not "-source <source>".
Specify a sourcetype field value for events from the input source. Specify a host name to set as the host field value for events from the input source.

sourcetype

No

hostname host

or

No

Note: These are functionally equivalent.

Example 1: Monitor files in a directory


The following example shows how to monitor files in /var/log/. Add /var/log/ as a data input:

./splunk add monitor /var/log/

Example 2: Monitor windowsupdate.log


The following example shows how to monitor the Windows Update log (where Windows logs automatic updates). Add C:\Windows\windowsupdate.log as a data input:

./splunk add monitor C:\Windows\windowsupdate.log

Example 3: Monitor IIS logging


This example shows how to monitor the default location for Windows IIS logging. Add C:\windows\system32\LogFiles\W3SVC as a data input:

./splunk add monitor C:\windows\system32\LogFiles\W3SVC

120

Example 4: Upload a file


This example shows how to upload a file into Spunk. Unlike the previous examples, Splunk only consumes the file once; it does not continuously monitor it. Upload /var/log/applog directly into Splunk with the add oneshot command:

./splunk add oneshot /var/log/applog

You can also upload a file via the sinkhole directory with the spool command:

./splunk spool /var/log/applog

The result is the same with either command.

121

Send data with Storm's REST API


Use Storm's REST API
This topic provides an overview of how to make REST API calls to add data to a Storm project. Use the /1/inputs/http endpoint to add raw data to a Storm project with the ability to change the source and host default fields. For an input example using Python, see the Storm data input endpoint Python example. For an input example using Ruby, see the Storm data input endpoint Ruby example

REST API call overview


When accessing this endpoint, specify the following: Authentication Splunk default fields (indexed fields that Splunk Storm automatically recognizes in your event data at search time) Request body (raw event text to be indexed) Here is an example call using cURL:

curl -u x:<Token> \ "https://api.splunkstorm.com/1/inputs/http?index=<ProjectID>&sourcetype=<type>" \ -H "Content-type: text/plain" \ -d "<Request body>"

Authentication Storm uses an API token authentication scheme over HTTP Basic Authentication with TLS data encryption (HTTPS). For each user, Storm creates an access token that you use as the "password" for access. Any value for username can be used with this token. Storm ignores the username field. You can use the same token for all Storm projects for which you are an administrator.

122

Splunk default fields These are parameters to the endpoint that must be embedded as query parameters in a URL. Required The following parameters are required. Storm responds with an HTTP 400 error if either of the required fields is missing. index: Specifies the project ID. sourcetype: Identifies the incoming data. See About source types for information on using source types with Splunk Storm. Optional You can optionally specify the following parameters. The data input API allows you to override the values normally associated with host and source if you need to use these for data classification. tz: Time zone to use when indexing events. host: The host emitting the data (defaults to the reverse DNS for the source IP). source: The data source (defaults to the source IP of the data). Request body The raw event text to input. You can send the raw event text as plain text (text/plain) or as URL-encoded text (application/x-www-form-urlencoded). In either case, Storm correctly interprets the raw input event text.

Build REST API calls to input data


There are several ways to input data into a Storm project using the REST API. The method you choose depends on the type and amount of data you want to send, the amount of physical memory you have on the server sending data, and the destination in your Storm projects for the data. Basic REST API call The most basic call sends a single event over a single connection in a single HTTP request. For example, consider a source type that handles log data. Open a connection to the project. For each logging event, send the single logging event
123

as the complete body of the request. You can keep the connection open to send additional logging events. This trivial use case is typically inefficient. You often send and receive more data in the headers of the request than in the body. Send multiple events over a single call In this use case, you buffer multiple events that you send in a single call. Again, consider a source type that handles log data. You may want to buffer many events locally until you reach a threshold, such as the size of the data, a time period to send data, a specific log level (for example, ERROR logs), or some other factor. If you have collections of events that need different Splunk default fields, such as source, send them in different HTTP requests. You specify these fields in the URL query string, which necessitates the separate requests. Once you reach the threshold, send the buffered events as the body of a single call. You can also pipeline requests. Keep the TCP connection open, and send multiple HTTP requests. The advantage of this use case is that you limit the SSL/TLS and HTTP overhead in sending and receiving data. However you have to consider factors such as: Local memory available for buffering data Risk of losing data should a server go down Timeliness of making the new events available in your project Send compressed data You may send data more efficiently by uploading a gzip compressed stream. You must supply a "Content-Encoding: gzip" header if sending gzipped data for it to be correctly decompressed on receipt. Example:

echo 'Sun Apr 11 15:35:15 UTC 2011 action=download_packages status=OK pkg_dl=751 elapsed=37.543' \ | gzip \ | curl -u x:$ACCESS_TOKEN \ "https://api.splunkstorm.com/1/inputs/http?index=$PROJECT_ID&sourcetype=generic_sing \

124

-H "Content-type: text/plain" \ -H "Content-Encoding: gzip" \ -v --data-binary @-

Example: Input data using cURL


Before sending data to a Storm project, obtain an access token and the Storm project ID from your Storm account. Then build a query to Splunk Storm to input data. When making a REST API call, the access token is paired with a username. However, Storm ignores the username, using only the access token for authentication. 1. Log in to your Storm project. Navigate to <Project_Name> > Inputs > API . 2. Copy the access token and project ID into your environment along with a choice of source type and whatever event you'd like to send. From a *nix terminal window, these values can be exported as environment variables. The following example specifies generic_single_line for a source type. The full list of source types is available from the About source types documentation.

export ACCESS_TOKEN=<access_token> export PROJECT_ID=<project_id> export SOURCETYPE=generic_single_line export EVENT="Sun Apr 11 15:35:15 UTC 2011 action=download_packages status=OK pkg_dl=751 elapsed=37.543"

3. Use curl to run the HTTP request using the above exported environment variables:

curl -u x:$ACCESS_TOKEN \ "https://api.splunkstorm.com/1/inputs/http?index=$PROJECT_ID&sourcetype=$SOURCETYPE" \ -H "Content-Type: text/plain" \ -d "$EVENT"

An actual response looks something like this:

Response: {"bytes":99,"index":"fdf492f01ee111e28f18123139332c16","host":"example.com","source":"8.

Sending a file You can also use cURL to send an entire file instead of a single event. However, to monitor a file (or multiple files) and send updates to Storm as they happen you should instead use a Splunk Forwarder for reliable transmission.
125

To send a single one-off file: 1. As with sending a single event in the previous example, export your access token and project id into your *nix terminal with the source type and name of the file you want to send:

export export export export

ACCESS_TOKEN=<access_token> PROJECT_ID=<project_id> SOURCETYPE=access_combined FILENAME=/var/log/apache2/access.log

2. Use curl to run the HTTP request using the above exported environment variables:

curl -u x:$ACCESS_TOKEN \ "https://api.splunkstorm.com/1/inputs/http?index=$PROJECT_ID&sourcetype=$SOURCETYPE" \ -H "Content-Type: text/plain" \ --data-binary @$FILENAME

Depending on the size of the file, curl may take several minutes to transfer the data to Storm. Note: Although you can transfer large files this way, for files of more than a few hundred megabytes it is recommended to use a Splunk forwarder instead. A forwarder automatically resumes in the case of a transmission error midway through the upload. For more details, see the documentation about forwarding data to Storm.

More examples
For examples using Python and Ruby, read "Examples: input data with Python or Ruby." We'll be adding more (and longer) code examples in our GitHub repository.

Storm data input endpoint


/1/inputs/http
Stream data to a Splunk Storm project.

126

Storm supports HTTP/1.1 connections in Keep-Alive mode. If your client sets the Connection: Keep-Alive HTTP request header, Storm keeps the TCP connection open after the first request, waiting for additional data. This allows you to send data with different metadata on the same TCP connection. Caution: Make sure that you send some complete events in each request. This applies whether you make only one HTTP request in the TCP connection or many. POST /1/inputs/http/ Stream events from the contents contained in the HTTP body to a project. When sending data, indicate the source type to apply to the events. See About source types for information on using source types with Splunk Storm.

Request

The body of the request contains the raw event text you are streaming to your Storm project. Name Type Required Description
Raw event text. This is the entirety of the HTTP request body.

<request_body> String

Specify parameters to the request as query parameters in the URL. Name index Type Required
String

Default
<Project

Description

ID> The Project ID to which events from of your first this input are sent. project.
The source type to apply to events from this input. See About source

sourcetype String

DNS PTR name for the source IP address IP address of the sending host

types for information on using source types with Splunk Storm.


The value to populate in the host field for events from this data input.

host

String

source

String

The authenticated IP address from which events are sent.

127

tz

String

The default time zone of the project

The time zone to apply to events from this input. Storm recognizes zoneinfo TZ IDs. Refer to the zoneinfo (TZ) database for all permissible tz values.

Response status

Status Code 200 400 403


Response body Data accepted.

Description
Request error. See response body for details. Not authorized to write to the project, or project does not exist.

For a successful response, the fields are: Field name bytes index host source sourcetype tz Type
Number The project ID. The host field the data was indexed with. The source field the data was indexed with. The source type field the data was indexed with. The time zone field the data was indexed with.

Description
Number of bytes received, if any.

For an error response, the fields are: Field name status type text
Example String String

Type
Number

Description
HTTP status code (for example, 200). The string "ERROR." The reason for the error.

Send a single event to the project f75b3a9abc with a sourcetype of syslog and the host set to my.example.com. Specify URL-encoding for the POST data. This example assumes that ACCESS_TOKEN is an environment variable specifying the Storm access token. x represents the required username string, which Storm ignores.
128

Request

curl -k -u x:$ACCESS_TOKEN \ "https://api.splunkstorm.com/1/inputs/http?index=f75b3a9abc&sourcetype=syslog&host=my. \ --data-urlencode "Sun Apr 11 15:35:15 UTC 2011 action=download_packages status=OK pkg_dl=751 elapsed=37.543"

Response

{"status": 200, "bytes_received": 90}

Examples: input data with Python or Ruby


This topic presents a few code examples using the Storm REST API data input endpoint. These and other examples are available for download from Storm's GitHub repository. For conceptual information about the Storm input endpoint, read "Use Storm's REST API." For syntax details, see "Storm data input endpoint."

Example: Input data with Python


The following example shows a basic Python script that imports syslog sourcetype data to a Storm project. Note: This script highlights the basic Python calls to Splunk Storm. It is not a model for programming in Python. For example, the project ID and access token are visible in plain text in the script.

import urllib import urllib2 class StormLog(object): """A simple example class to send logs to a Splunk Storm project. Your ``access_token`` and ``project_id`` are available from the Storm UI. """ def __init__(self, access_token, project_id, input_url=None): self.url = input_url or 'https://api.splunkstorm.com/1/inputs/http'

129

self.project_id = project_id self.access_token = access_token self.pass_manager = urllib2.HTTPPasswordMgrWithDefaultRealm() self.pass_manager.add_password(None, self.url, 'x', access_token) self.auth_handler = urllib2.HTTPBasicAuthHandler(self.pass_manager) self.opener = urllib2.build_opener(self.auth_handler) urllib2.install_opener(self.opener) def send(self, event_text, sourcetype='syslog', host=None, source=None): params = {'project': self.project_id, 'sourcetype': sourcetype} if host: params['host'] = host if source: params['source'] = source url = '%s?%s' % (self.url, urllib.urlencode(params)) try: req = urllib2.Request(url, event_text) response = urllib2.urlopen(req) return response.read() except (IOError, OSError), ex: # An error occured during URL opening or reading raise

# Example # Set up the example logger # Arguments are your access token and the project ID log = StormLog('abcdefghi...', '198ahb3280...') # Send a log; will pick up the default value for ``source``. log.send('Apr 1 2012 18:47:23 UTC host57 action=supply_win amount=5710.3', sourcetype='syslog', host='host57') # Will pick up the 'default' value for ``host``. log.send('Apr 1 2012 18:47:26 UTC host44 action=deliver from=foo@bar.com to=narwin@splunkstorm.com', sourcetype='syslog')

Example: Input data with Ruby


#!/usr/bin/env ruby # # = Usage # 1. Download this script to your local system. # 2. Obtain your Project ID and Access Token: # Log in to your Storm project and navigate to *<Project_Name> > Inputs

130

> API*. # 3. Open this script in a text editor. # 4. In the _User Options_ section, set *PROJECT_ID* and *ACCESS_TOKEN*. # 5. Also in _User Options_, set +event_params+'s *sourcetype* and *host*. # 6. Pipe your data into this script. For example: # $ ruby storm-rest-ruby.rb < system.log # # = Definitions # [PROJECT_ID] Splunk Storm Project ID. # [ACCESS_TOKEN] Splunk Storm REST API Access Token. # [sourcetype] Format of the event data, e.g. syslog, log4j. # [host] Hostname, IP or FQDN from which the event data originated. # # = Requires # * {rest-client gem}[https://github.com/archiloque/rest-client] require 'rest-client'

# User Options PROJECT_ID = 'xxx' ACCESS_TOKEN = 'yyy' event_params = {:sourcetype => 'syslog', :host => 'gba.example.com'}

# Nothing to change below API_HOST = 'api.splunkstorm.com' API_VERSION = 1 API_ENDPOINT = 'inputs/http' URL_SCHEME = 'https' # Actual code event_params[:project] = PROJECT_ID api_url = "#{URL_SCHEME}://#{API_HOST}" api_params = URI.escape(event_params.collect{|k,v| "#{k}=#{v}"}.join('&')) endpoint_path = "#{API_VERSION}/#{API_ENDPOINT}?#{api_params}" request = RestClient::Resource.new( api_url, :user =>'x', :password => ACCESS_TOKEN) response = request[endpoint_path].post(ARGF.read) puts response

More examples in GitHub


We'll be adding more (and longer) code examples in our GitHub repository.

131

Explore your data


About Splunk Web
Splunk Web is the graphical user interface Splunk Storm provides to search, analyze, and report on your data. If you're familiar with the core Splunk product, you'll recognize the Storm version of Splunk Web. If you're new to both Splunk and Splunk Storm, check out the Storm tutorial. Once you've added some data to your Storm project, access the Storm web interface (which is very similar to Splunk Web) by clicking the big Explore data button (be patient for it to appear a few minutes after you've added data). You'll be able to search, analyze, and report on all the data in that project. Note: If you've got data in another project, you must select the project and click Explore data from within it to see the data for that project.

Search your data


Splunk Storm supports most of the search commands available in the core Splunk product. Refer to the Search Reference Manual in the core Splunk product documentation for details on all the available search commands. If you're new to Splunk, check out the Storm tutorial for a walkthrough of some of what's possible. See the Search language quick reference in this chapter for search commands and example searches. Note: Some of the features that are available in the core Splunk product are not available in Storm. In particular, the Search Reference discusses searching via the CLI, but this feature is not available in Storm.

132

Search language quick reference


This topic is a quick reference to search commands available in Splunk Storm, with examples. For an introduction to the search language, follow the Storm tutorial. For detailed information about each search command, see the core Splunk Search Reference Manual. This topic is also represented as two PDF files. Note that both PDFs have a few sections of commands that are applicable only in core Splunk, not in Splunk Storm (for example, administrative search commands). The Search Command Cheat Sheet (for core Splunk) is a quick command reference complete with descriptions and examples. The Search Command Cheat Sheet is also available for download as an eight-page PDF file. The core Splunk Search Language Quick Reference Card, available only as a PDF file, is a six-page reference card that provides fundamental search concepts, commands, functions, and examples. Note: In the examples on this page, a leading ellipsis (...) indicates that there is a search before the pipe operator. A leading pipe | prevents Storm from prepending the "search" operator on your search.

Answers
Have questions about search commands? Check out Splunk Answers to see what questions and answers other Splunk users had about the search language. Now, on to the cheat sheet!

fields
add
Save the running total of "count" in a field called "total_count". Add information about the search to each event. Search for "404" events and append the fields in each event to the previous search results. For each event where "count" exists, compute the difference between count and its previous value and ... | accum count AS total_count ... | addinfo ... | appendcols [search 404] ... | delta count AS countdiff

133

store the result in "countdiff". Set velocity to distance / time. Extract field/value pairs and reload field extraction settings from disk. Extract field/value pairs that are delimited by "|;", and values of fields that are delimited by "=:". Add location information (based on IP address). Extract values from "eventtype.form" if the file exists. ... | eval velocity=distance/time ... | extract reload=true ... | extract pairdelim="|;", kvdelim="=:", auto=f ... | iplocation ... | kvform field=eventtype

Extract the "COMMAND" field when it occurs in rows that ... | multikv fields contain "splunkd". COMMAND filter splunkd

to "green" if the date_second is between 1-30; "blue", if between 31-39; "red", if between 40-59; and "gray", if no range matches (for example, if date_second=0).
Set range

... | rangemap field=date_second green=1-30 blue=31-39 red=40-59 default=gray

Calculate the relevancy of the search and sort the results disk error | relevancy | in descending order. sort -relevancy Extract "from" and "to" fields using regular expressions. If ... | rex field=_raw "From: a raw event contains "From: Susan To: Bob", then (?<from>.*) To: (?<to>.*)" from=Susan and to=Bob. Extract the "author" field from XML or JSON formatted data about books. Add the field: "comboIP". Values of "comboIP" = ""sourceIP" + "/" + "destIP"". ... | spath output=author path=book{@author} ... | strcat sourceIP "/" destIP comboIP

Extract field/value pairs from XML formatted data. ... | xmlkv "xmlkv" automatically extracts values between XML tags. Extract the name

value from _raw XML events.

sourcetype="xml" | xpath outfield=name "//bar/@name"

convert
Convert every field value to a number value except for values in the field "foo" (use the "none" argument to specify fields to ignore). Change all memory values in the "virt" field to Kilobytes. Change the sendmail syslog duration format (D+HH:MM:SS) to seconds. For example, if "delay="00:10:15"", the resulting value will be "delay="615"". ... | convert auto(*) none(foo) ... | convert memk(virt) ... | convert dur2sec(delay)

Convert values of the "duration" field into number value ... | convert by removing string values in the field value. For example, rmunit(duration)

134

if "duration="212 sec"", the resulting value will be "duration="212"". Separate the value of "foo" into multiple values. For sendmail events, combine the values of the senders field into a single value; then, display the top 10 values. ... | makemv delim=":" allowempty=t foo eventtype="sendmail" | nomv senders | top senders

filter
Keep the "host" and "ip" fields, and display them in the order: "host", "ip". Remove the "host" and "ip" fields. ... | fields + host, ip ... | fields - host, ip

modify
sourcetype="web" | Build a time series chart of web events by host and fill all timechart count by host | empty fields with NULL. fillnull value=NULL Rename the "_ip" field as "IPAddress". Change any host value that ends with "localhost" to "localhost". ... | rename _ip as IPAddress ... | replace *localhost with localhost in host

read
There is a lookup table specified in a stanza name 'usertogroup' in transform.conf. This lookup table contains (at least) two fields, 'user' and 'group'. For each ... | lookup usertogroup event, we look up the value of the field 'local_user' in the user as local_user OUTPUT table and for any entries that matches, the value of the group as user_group 'group' field in the lookup table will be written to the field 'user_group' in the event.

formatting
Show a summary of up to 5 lines for each search result. Map a single numerical value against a range of colors that may have particular business meaning or business logic. Highlight the terms "login" and "logout". ... | abstract maxlines=5 ... | stats count as myCount | gauge myCount 5000 8000 12000 15000 ... | highlight login,logout

Output the "_raw" field of your current search into "_xml". ... | outputtext

135

reporting
Calculate the sums of the numeric fields of each result, and put the sums in the field "sum". Analyze the numerical fields to predict the value of "is_activated". Return events with uncommon values. Return results associated with each other (that have at least 3 references to each other). For each event, copy the 2nd, 3rd, 4th, and 5th previous values of the 'count' field into the respective fields 'count_p2', 'count_p3', 'count_p4', and 'count_p5'. Bucket search results into 10 bins, and return the count of raw events for each bucket. Return the average "thruput" of each "host" for each 5 minute time span. Return the average (mean) "size" for each distinct "host". Return the the maximum "delay" by "size", where "size" is broken down into a maximum of 10 equal sized buckets. ... | addtotals fieldname=sum ... | af classfield=is_activated ... | anomalousvalue action=filter pthresh=0.02 ... | associate supcnt=3 ... | autoregress count p=2-5 ... | bucket size bins=10 | stats count(_raw) by size ... | bucket _time span=5m | stats avg(thruput) by _time host ... | chart avg(size) by host ... | chart max(delay) by size bins=10

... | chart Return the ratio of the average (mean) "size" to the eval(avg(size)/max(delay)) maximum "delay" for each distinct "host" and "user" pair. by host user Return max(delay) for each value of foo split by the value ... | chart max(delay) over of bar. foo by bar Return max(delay) for each value of foo. ... | chart max(delay) over foo ... | contingency datafield1 datafield2 maxrows=5 maxcols=5 usetotal=F ... | correlate type=cocur | eventcount ... | eventstats avg(duration) as avgdur ... | makecontinuous _time span=10m

Build a contingency table of "datafields" from all events.

Calculate the co-occurrence correlation between all fields. Return the number of events in the project. Compute the overall average duration and add 'avgdur' as a new field to each event where the 'duration' field exists Make "_time" continuous with a span of 10 minutes.

136

Remove all outlying numerical values. Return the least common values of the "url" field. Remove duplicates of results with the same "host" value and return the total count of the remaining results. Return the average for each hour, of any unique field that ends with the string "lay" (for example, delay, xdelay, relay, etc).

... | outlier ... | rare url ... | stats dc(host) ... | stats avg(*lay) BY date_hour

For each event, add a count field that represent the number of event seen so far (including that event). i.e., 1 ... | streamstats count for the first event, 2 for the second, 3, 4 ... and so on Graph the average "thruput" of hosts over time. Create a timechart of average "cpu_seconds" by "host", and remove data (outlying values) that may distort the timechart's axis. Calculate the average value of "CPU" each minute for each "host". ... | timechart span=5m avg(thruput) by host ... | timechart avg(cpu_seconds) by host | outlier action=tf ... | timechart span=1m avg(CPU) by host

Create a timechart of the count of from "web" sources by ... | timechart count by "host" host ... | timechart span=1m Compute the product of the average "CPU" and average eval(avg(CPU) * avg(MEM)) "MEM" each minute for each "host" by host Reformat the search results. Return the 20 most common values of the "url" field. Search the access logs, and return the number of hits from the top 100 values of "referer_domain". ... | timechart avg(delay) by host | untable _time host avg_delay ... | top limit=20 url sourcetype=access_combined | top limit=100 referer_domain | stats sum(count) ... | xyseries delay host_type host

Reformat the search results.

results
append
Append the current results with the tabular results of "fubar". Joins previous result set with results from "search foo", on the id field. ... | chart count by bar | append [search fubar | chart count by baz] ... | join id [search foo]

137

filter
Return only anomalous events. Remove duplicates of results with the same host value. ... | anomalies

... | dedup host

Combine the values of "foo" ... | mvcombine delim=":" foo with ":" delimiter. Keep only search results whose "_raw" field contains ... | regex IP addresses in the _raw="(?<!\d)10.\d{1,3}\.\d{1,3}\.\d{1,3}(?!\d)" non-routable class A (10.0.0.0/8). Join results with itself on 'id' ... | selfjoin id field. Return "physicsobjs" events sourcetype=physicsobjs | where distance/time > with a speed is greater than 100 100.

generate
All daily time ranges from Oct 25 till today Loads the events that were generated by the search job with id=1233886270.2 Create new events for each value of multi-value field, "foo". Run the "mysecurityquery" saved search. | gentimes start=10/25/12 | loadjob 1233886270.2 events=t ... | mvexpand foo | savedsearch mysecurityquery

group
Cluster events together, sort them by their "cluster_count" values, and then return the 20 largest clusters (in data size). ... | cluster t=0.9 showcount=true | sort cluster_count | head 20

Group search results into 4 clusters based on the values ... | kmeans k=4 date_hour of the "date_hour" and "date_minute" fields. date_minute Group search results that have the same "host" and "cookie", occur within 30 seconds of each other, and do not have a pause greater than 5 seconds between each event into a transaction. Have Splunk automatically discover and apply event types to search results Force Splunk to apply event types that you have configured (Splunk Web automatically does this when ... | transaction host cookie maxspan=30s maxpause=5s ... | typelearner ... | typer

138

you view the "eventtype" field).

order
Return the first 20 results. Reverse the order of a result set. Sort results by "ip" value in ascending order and then by "url" value in descending order. Return the last 20 results (in reverse order). ... | head 20 ... | reverse ... | sort ip, -url ... | tail 20

search
search
Keep only search results that have the specified "src" or "dst" values. src="10.9.165.*" OR dst="10.9.165.8"

subsearch
Get top 2 results and create a search from their host, source, and source type, resulting in a single search ... | head 2 | fields result with a _query field: _query=( ( "host::mylaptop" source, sourcetype, host | AND "source::syslog.log" AND "sourcetype::syslog" ) OR format ( "host::bobslaptop" AND "source::bob-syslog.log" AND "sourcetype::syslog" ) ) ... | localize maxpause=5m | map search="search failure starttimeu=$starttime$ endtimeu=$endtime$" | set diff [search 404 | fields url] [search 303 | fields url]

Search the time range of each previous result for "failure".

Return values of "URL" that contain the string "404" or "303" but not both.

Print a PDF
You can: Generate PDFs of dashboards, views, searches, or reports with a click of a button. COMING SOON: Arrange to have PDFs of searches, reports, and dashboards sent to a set of recipients that you define, on a regular schedule. COMING SOON: Arrange to have PDFs of searches and reports sent to a set of recipients that you define when specific alert conditions are met.
139

There are exceptions involving forms, dashboards that are built with advanced XML, and simple XML dashboards that have panels rendered in Flash rather than JavaScript. See the "Exceptions" section, below, for more information.

Generate dashboard PDFs


When you are viewing a dashboard in Splunk, click Generate PDF to generate a PDF that you can view through your browser or a PDF viewer application. When you are viewing a particular dashboard you can immediately generate a PDF of it by clicking the Generate PDF button at the top of the dashboard. The resulting PDF will appear in your browser window, displaying results that are accurate up to the moment that the button was clicked. If you use a browser that does not display graphics in the PDF format (such as IE8) you should install a PDF viewer application (if your OS does not already provide one) to enable viewing of the PDFs that Splunk generates.

Exceptions: Advanced XML, Forms, and Flash


Integrated PDF generation cannot: Generate PDFs from dashboards that are built with advanced XML. Generate PDFs from forms. Make use of chart customization parameters that are only available for Flash (rather than JavaScript) in PDFs. Dashboards and forms that use advanced XML won't be printed If you want to print a dashboard that uses advanced XML, create a simple XML version of the dashboard. Use charting library properties that are supported by the JSChart charting library, which renders chart graphics with JavaScript rather than Flash. There are advanced dashboard features (such as search postprocessing) that can currently only be achieved with usage of advanced XML. Take note, however, that in Splunk 5.x, two dashboard features that were previously only available with advanced XML--dynamic drilldown and form searches--have been made available in simple XML. For more information about these features, refer to "Dynamic drilldown in dashboards and forms" and "Build and edit forms with simple XML" in the Data Visualizations Manual in core Splunk product documentation.
140

Forms won't be printed Integrated PDF generation cannot print forms at this point, whether they have been constructed with simple or advanced XML. Flash-only chart customizations will be ignored As we detail in the "About JSChart" topic of the Data Visualizations manual in core Splunk product documentation, Splunk uses the JSChart charting library to render dashboard panels for viewing through your browser except in cases where chart customizations that aren't supported by JSChart are implemented in the underlying simple XML. Panels that have JSChart-unsupported customizations are instead rendered for browser display by the Flash charting library. Integrated PDF generation relies solely on the JSChart charting library to render images of dashboard panels in PDFs. Because of this, when Splunk uses integrated PDF generation to generate a PDF of a dashboard it renders all dashboard panels with JSChart. When dashboard panels include simple XML chart customizations that are unsupported by JSChart, Splunk ignores those customizations. This means that panels with JSChart-unsupported customizations may appear different in PDF format than they do in your browser. In the browser they are rendered with Flash and have the customizations, while in the PDF they are rendered with JSChart and do not include the customizations. To see which charting library parameters are supported by JSChart, go to the custom charting configuration reference in the Data Visualizations manual in core Splunk product documentation. Each topic in the reference contains tables of chart customization parameters, and each table has a Supported by JSChart? column. Note: Over time we will be increasing the number of chart customization parameters that are compatible with JSChart, which means that you'll be able to make a wider range of customizations to your dashboard panels and still have them print correctly through integrated PDF generation.

141

Manage and share a project


About projects
Splunk Storm allows you to create projects to organize your data. A project can contain just one input or several.

Manage users' access by project


One main reason to organize your data into projects is that you can choose to share a project with other users. If you've got some data you want to share and some you want to keep private, create two projects: one that's just for you, and one to share with other users. For more information about sharing projects, refer to "Share your project" in this manual.

Use a project as a sandbox


Another reason you might want more than one project is to have a project that acts as your sandbox. When you want to add new data to Storm, it's always a good idea to try it out in a sandbox first to make sure that the data is parsed and indexed by Storm in the way you want. Once you've tested it out in your sandbox, you can delete the sandbox (or just leave it) and add the same data to your "real" project.

Share your project


Splunk Storm lets you share your data with other users. When you create a project, you are that project's owner. As the project's owner, you can invite up to 5 other users (10 users for a paid project) to join your project and designate which of the two available Storm roles those users belong to.

Project members
Invite users to your project, or view and manage current members of a project, by clicking the Members tab for that project.

142

To invite a user, click +Add member. Select a project role (User or Admin) for your invited member. Click Send invite. To change a member's role, find the member's name in the list and click Edit. Note: You can invite up to 5 users to join a free project. You can invite up to 10 users to join a paid project.

Project roles
Storm users can belong to one of three roles: Owner, Admin, or User. Users can: view the project data run searches on the project data Admins can do everything a user can, plus: add data to that project invite other users to view the project delete data from the project Owners can do everything an admin can, plus: create a project delete the project update and change billing information request that a project be transferred to another project member Project owner The user who creates a given project is given the role of owner for that project. A project can have only one owner (whereas a project can have multiple admins and users). The project owner is responsible for project payment. Project owner gets billed for the project The user who is listed as the project owner receives all bills for that project. Read about Storm billing in this manual.

143

Transfer ownership of a project


You can transfer ownership of a Storm project to another member of your team. This will eventually be a self-service process, but currently you must file a case with Storm Support to initiate the transfer. Refer to "Contact Storm Support" in this manual for instructions on filing a case.

As you file your support case


When you file the case, select the project you want to transfer from the drop-down list. Specify the project member to whom you want to transfer ownership of the project. If this member is not currently an Admin user in the project, he or she will be converted to an Admin user during the transfer. Storm Support will send an email to you and the designated project member confirming the transfer. You both must confirm the transfer before it can be completed.

About billing
If the project is a paid project, the user to whom you transfer the project must have a credit card on file in Splunk Storm before the project can be transferred. If the project is a paid project, any credit remaining in the Storm account balance of the original project owner will be transferred with the project to the new project owner. The new project owner's credit card will be charged immediately for any outstanding balance due for the current month.

About data inputs


If you are sending data to the project using the API or forwarders, pay attention to which user set up the forwarding or API. The API token and the forwarder credentials package is specific to the user, and if the member who sets up the inputs leaves the project, authentication will fail. A good practice is when setting up inputs, to use the credentials package or API authentication token of a permanent member, like the project owner. But if you find yourself in this situation, here is how you fix it.

144

For API inputs Update the API access token in your code. You can find your API access token at <Project_name> > Inputs > API. For Splunk forwarders Update the credentials package (the stormforwarder_xxxx app) using the CLI. Download the correct package, stormforwarder_xxxxx.spl, then use
./splunk remove app stormforwarder_xxxxx ./splunk install app /path/to/my/new/creds/stormforwarder_xxxxx.spl ./splunk restart

Because the old and new credentials packages have the same folder and file name, it's not easy to distinguish the member in the credentials package. One method is to rename the package stormforwarder_xxxxxxx.spl to .tar.gz, untar it, and check the content of the inputs.conf (the API token is in there, along with the project ID). You can compare the forwarder's inputs.conf to the default file in $SPLUNK_HOME/etc/apps/stormforwarder_xxxxx/default/inputs.conf

Troubleshoot a project
This topic suggests various troubleshooting options for you to use if you're having difficulty with Storm.

I'm sending data, but why am I not seeing data in my project?


If you are using TCP/UDP, this can happen if your IP addresses changes, or if you have multiple IP addresses sending data and some are not authorized. Try turning on auto-authorization and see if the data appears. If you are using other input methods, make sure you are using the correct authentication token. Make sure you have your time zone set correctly in your "My Account" settings. If you're still not seeing data, it's possible that your time is set incorrectly on the data source sending data. You can quickly diagnose this by searching for "*" and selecting Real-time -> All-time in the time picker dropdown. This will show you all data coming into your Storm project regardless of timestamp.

145

Delete data from a project


You have three ways to delete your data. You can delete the entire project that the data's in, you can delete old data according to your retention policy, or you can manually delete all data older than some age you select. Once you delete your data, you will not be able to access it in Storm.

Delete the entire project


From the Projects page, click Settings, then Delete project. It sometimes takes a while to delete data.

Use your deletion policy


Your deletion policy sets the amount of time Storm keeps your data. Access your deletion policy through your project's storage. From the Projects page, click Storage. Scroll down to the Deletion policy section and click Change deletion policy. Set a longer or shorter deletion policy by entering your desired dates. Read about deletion policies in "Choose a data storage plan" in this manual. Minimum deletion policy The minimum length of time you can specify for data to be kept in Storm is 1 month (30 days). You can specify a minimum of 30 days in your deletion policy, and you cannot delete data manually that is less than 30 days old.

146

Delete data manually


In your project's Storage tab, scroll down to the Deletion policy section. Click Delete data now.

On the Delete data page, set the number of days you wish to retain data. Confirm by clicking Delete. It sometimes takes a while to delete data. Note: You cannot delete data that is less than 30 days old.

About inactive projects


If you leave a free project inactive for a long time (by not sending it any data or logging into it and searching), we'll send a warning email to the owner of the project at 60 days and then automatically delete the project at 90 days. In your list of projects, the inactive project will be red. Mouse over its name for details:

To keep your project, you can do any of the following: log into splunkstorm.com, click "explore data" in your project, and run a
147

search; send new data to your project; or upgrade your project to a paid plan. You can also delete your inactive project yourself. Note that we won't delete any paid projects. And we won't delete your Splunk Storm user account, even if you have no projects at all.

148

Alerts - Coming soon!


Alerting overview - coming soon
Note: Alerting is coming soon and currently in Private Beta. The addition of alerts can turn Splunk Storm into an invaluable monitoring tool. You can configure a variety of alerting scenarios for your searches. You can have your searches run automatically on regular schedules, and you can set up searches so they send alert messages to you and others when specific circumstances are met by their results. You can base these alerts on a wide range of threshold and trend-based scenarios, including empty shopping carts, brute force firewall attacks, and server system errors.

Get started with alert creation


If you run a search, like the results it's giving you, and decide that you'd like to base an alert on it, then click the Create button that appears above the search timeline.

Select Alert... to open the Create alert dialog on the Schedule step. Give the alert a Name and then select the alert Schedule.

Next, set triggering conditions.

149

Need guidance on alerts?


Splunk's core product documentation gives thorough guidance on the types of alerts available to you and some examples of occasions appropriate for each type of alert. Start reading at Define per-result alerts. When reading the core Splunk product documentation, bear in mind that a few features exist in Splunk Enterprise but not in Storm (yet): Flexible scheduling. Storm lets you choose from several preset schedules. Throttling based on field values. In Storm you can throttle only on the basis of time. Real-time searches. You can approximate a real-time search in Storm by scheduling a search with a time range that goes back a little bit further than the previous scheduled run of the search.

Use Manager to expand alert functionality


The section above steps you through the Alerting Wizard. More configuration options are available to you in Manager > Searches and reports. Read about this functionality in the core Splunk product documentation on using Manager to update and expand alert functionality.

150

S-ar putea să vă placă și