Documente Academic
Documente Profesional
Documente Cultură
Table of Contents
Introduction..........................................................................................................1 About Splunk Storm..................................................................................1 Storm FAQ................................................................................................2 Learn more and get help...........................................................................4 Known issues and changelog....................................................................5 Splunk concepts................................................................................................18 Inputs and projects..................................................................................18 About source types.................................................................................18 Add value to your application: conscious logging .........................................22 Why care about your logs?.....................................................................22 About logging..........................................................................................26 Best practices ..........................................................................................29 Examples................................................................................................31 Storm Tutorial....................................................................................................34 Welcome to the Storm tutorial.................................................................34 Create a Storm account and log in ..........................................................34 Create the tutorial project........................................................................34 Add data to the tutorial project................................................................35 Introduction to the Storm UI....................................................................37 Start searching........................................................................................41 Use the timeline......................................................................................46 Change the time range ............................................................................50 Use fields to search .................................................................................53 Save a search.........................................................................................61 Use Splunk's search language ................................................................63 Use a subsearch.....................................................................................70 More search examples............................................................................73 Create reports.........................................................................................77 Build and share a dashboard..................................................................83 Get started..........................................................................................................86 Create and activate an account..............................................................86 Create a project .......................................................................................86 Choose a data storage plan....................................................................88 How much data am I sending to Storm?.................................................90 About billing .............................................................................................91
i
Table of Contents
Add data ..............................................................................................................93 About adding data...................................................................................93 Send data (including syslog) over a TCP/UDP port.................................93 Set up syslog ...........................................................................................96 Set up syslog-ng.....................................................................................97 Set up rsyslog.........................................................................................99 Set up Snare (for Windows)..................................................................100 Send data via netcat.............................................................................102 Send data from Heroku.........................................................................102 Upload a file..........................................................................................104 Send data with forwarders..............................................................................106 About forwarding data to Storm............................................................106 Set up a universal forwarder on Windows .............................................108 Set up a universal forwarder on *nix.....................................................111 Edit inputs.conf ......................................................................................115 CLI commands for input........................................................................119 Send data with Storm's REST API ..................................................................122 Use Storm's REST API.........................................................................122 Storm data input endpoint.....................................................................126 Examples: input data with Python or Ruby...........................................129 Explore your data .............................................................................................132 About Splunk Web................................................................................132 Search your data...................................................................................132 Search language quick reference.........................................................133 Print a PDF ............................................................................................139 Manage and share a project ............................................................................142 About projects.......................................................................................142 Share your project.................................................................................142 Transfer ownership of a project .............................................................144 Troubleshoot a project..........................................................................145 Delete data from a project.....................................................................146 About inactive projects..........................................................................147
ii
Table of Contents
Alerts - Coming soon!.....................................................................................149 Alerting overview - coming soon...........................................................149
iii
Introduction
About Splunk Storm
What is Splunk Storm?
Splunk Storm is a cloud-based service that turns machine data into valuable insights. Machine data is generated by web sites, applications, servers, networks, mobile devices, and the like. Splunk Storm consumes machine data and allows users to search and visualize it to monitor and analyze everything from customer clickstreams and transactions to network activity to call records.
thresholds, critical system errors, and load. Security analysts and incident response teams can investigate activity of flagged users and access to sensitive data, and use sophisticated correlation via search to find known risk patterns such as brute force attacks, data leakage, and even application-level fraud.
Storm FAQ
The following is a list of frequently asked questions about Splunk Storm:
Get help, give feedback, and file bugs for Splunk Storm
There is a dedicated support and feedback forum. To access the forum, log into your www.splunkstorm.com account. Click Help at the top right corner of the user interface, and then click Discussion forums. Your project ID, which you can find on your project's Settings page, is included automatically with your posting.
The core Splunk documentation Splunk Answers The #splunk IRC channel on EFNET
Known issues
This section lists known issues in Splunk Storm. As bugs get fixed, they are moved from known issues to the changelog. Data inputs On a TCP or UDP input only (not on forwarder or API inputs), if you decide to change the source type of an input (on project's Network data > Edit by IP address), the source type will not be updated automatically. With TCP, you need to wait about 1 minute (after selecting new source type and clicking Update) and then reconnect. With UDP, you need to wait about 3 minutes and reconnect. (STORM-1626) Events may be indexed with incorrect source and host and with source type "tcp-raw". (STORM-4115) Timezone is not recognized with two source types: generic single-line data and Storm multi-line data. (STORM-6312) With TCP/UDP inputs, the Inputs > Network data > Data last received column shows NA. (STORM-6343/STORM-6375) Exceptionally, some events may be missing after a file upload. (STORM-6555) Column headers for CSV files are not automatically detected. To work around this, define them manually in Manager > Fields > Field extractions using Regex. (STORM-5195)
Exploring data The count of events on the summary page (after you click "Explore Data") may not update immediately after an upload. Solution: Refresh the page. (STORM-4394/SPL-51502) A new project might not be searchable, and display banners showing an error like "Reached end-of-stream" or "Search process did not exit cleanly, exit_code=254". Please open a support ticket in Help > Report an issue. (STORM-6344) Error message on the summary page "Streamed search execute failed because: Error in 'metadata': The metadata search command cannot be run because global metadata generation has been disabled on index XXX". This known issue does not impact search, only the counters on the summary page. (STORM-5887) Multiple projects on the same search head (that is, search1, search2, or search3 in the URL while exploring data) cannot coexist in the same browser. Workaround: Explore one project at a time, use different browsers, or use anonymous browsing windows. (STORM-5712) The "iplocation" search command fails with the errors "exited with code 255". (STORM-6715) Other Total data received over the past week and month does not show in Project > Storage. (STORM-5065)
Changelog
This section lists the issues resolved in each production update of the Splunk Storm service, by date. July 10, 2013 Implement new UI tools. (STORM-6850, STORM-6851, STORM-6852) Orchestration bugfixes. Tenderapp help forum has been deprecated in favor of Splunk Answers (answers.splunk.com). Answers is more fun. (STORM-6576) Backend changes to support yearly billing through sales in addition to self-service monthly billing (which is also still available). (STORM-6740,
6
STORM-6784) Count of yearly invoices, if any exist, is viewable in user Account page. (STORM-6934) June 5, 2013 When a customer downgrades a project to a cheaper plan, the change now goes into effect immediately (instead of with the new billing cycle). (STORM-6567) Upgraded Splunk package to improve Splunk stability. Fixed a bug with the Windows eventlog source type. (STORM-6600) Cosmetic issue: the spinning wheel during searches appears to never finalize. (STORM-6529) May 8, 2013 Spring cleaning: After today's maintenance, you might get email about inactive free projects. Read more at about inactive projects. Added sample code for API inputs to UI. (STORM-6447) UI bugfixes: removed errant "unit tests" button (STORM-6550) Assorted UI bugfixes. (STORM-6493, STORM-6492, STORM-6482) Upgraded Splunk package (same version, just a few Splunk-side bugfixes). E.g., got rid of annoying app deletion messages in yellow banners. (SPL-65784) Fixed bug with timezone recognition in source types. (STORM-6312) April 24, 2013 Alerting is now in Private Beta. (STORM-6070, STORM-6346, STORM-5396) Fixed a bug in which some invoices were not sorting well. (STORM-6401) Fixed a bug that was preventing some data from uploaded files (not any other inputs) from being received. (STORM-6430) File upload now checks for source type earlier in the workflow. (STORM-5789) April 18, 2013 Upgraded Storm to Splunk 5.0.3. Restart your browser and/or do a shift + reload in your browser for the UI changes to take effect. Features of interest in Splunk 5.x include integrated PDF generation, which allows you to create PDF files from your simple XML dashboards, views, searches, or reports; dynamic drilldown, which lets you create
7
custom drilldown behavior for any simple XML table or chart; and JSCharts enhancements. April 11, 2013 API input endpoint now recognizes the timezone parameter tz. (STORM-6300) Storm now recognizes the log4net_xml layout source type. (STORM-6051) Storm now recognizes the search commands gauge, gentimes, iplocation, xmlv, and xpath. (STORM-4699) Alerting infrastructure work. (STORM-6071) Alerting workflow planning. (STORM-6070, STORM-6331, STORM-6342) IFX field extraction test was failing to open a new window. (STORM-3827) March 19, 2013 Alerting backend work. (STORM-6071, STORM-6191) UI styling for signup pages. (STORM-6202) Improvements to project deletion. (STORM-6143) Resolved transfer ownership bug with third-party billing vendor. (STORM-5994) February 28, 2013 Bugfixing. (STORM-6055, STORM-6021) Added monitoring for situation that produces the end-of-stream error in the UI for some customers. (STORM-6134) February 14, 2013 More backend work for alerting (no, not yet). (STORM-5861) Wording changes on new inputs overview page. (STORM-5952) Help page update. (STORM-4628) Backend bugfixes and infrastructure work. January 28, 2013 Backend work to support alerting (Beta coming soon!). (STORM-5921, STORM-5916, STORM-5652, STORM-5648, STORM-5637, STORM-5615, STORM-5623, STORM-5606) Release infrastructure work. (STORM-5955)
January 17, 2013 Release process improvements. (STORM-5562, STORM-5795, STORM-4824) Infrastructure improvements. (STORM-5936, STORM-5891, STORM-5849) Backend work for transferring project ownership. (STORM-5880, STORM-5411, STORM-5258) Note: For now, project ownership can be transferred only by filing a Support ticket. Beta API input endpoint now supports gzip encoding. (STORM-5848) Design changes to "change plan" and "create project" pages. (STORM-5658, STORM-5657) Added an overview page to inputs workflow. (STORM-5289) UI tweaks. (STORM-5580, STORM-4921, STORM-5858) December 12, 2012 Fixed UI bugs. (STORM-5758, STORM-3055) Added a more descriptive error page. (STORM-3916) Reworked API input endpoint. (STORM-5074, STORM-5075, STORM-5076, STORM-5077, STORM-5491) SYNTAX CHANGE: If you were previously using the Storm API input endpoint, note that the auth token has moved into the password field. The documentation has been updated to reflect the new syntax. (STORM-5472) November 27, 2012 Owners of paid projects can now invite 10 other members. Free projects still get 5 invites. (STORM-5538) Fixed a problem in which some projects were missing buckets and thus not able to store as much data as they should. (STORM-5595) Improved sparklines on Projects page. (STORM-5300) Improved login and saving project settings screen flows. (STORM-5494, STORM-5354, STORM-5288) Assorted UI tweaks on marketing and signup. (STORM-5503, STORM-5482, STORM-5376) October 24, 2012 Fixed issue with failure to change deletion policy on project. (STORM-5223) Added informational message to the "Create Ticket" page. (STORM-5150)
9
Users can triple-click to select the port number on the network input page. (STORM-5353) Implemented various orchestration and infrastructural improvements. September 25, 2012 Fixed issue with intermittent HTTP 500 Internal Server Error when uploading a file. (STORM-5037) Fixed issue causing traceback with "DatabaseError: deadlock detected" when creating a new Free project. (STORM-5006) Fixed issue preventing change of deletion policy. (STORM-4867) Restructured packaging of Storm so search heads don't restart during a release unless changes apply specfically to them. (STORM-4809) September 6, 2012 Fixed issue with changing deletion policy not having access to timezone information. (STORM-4947) Made improvements to internal monitoring. (STORM-4622) Made improvements to signup page form handling. (STORM-5010, STORM-5114, STORM-5058, STORM-5085) Made improvements to info on data deletion policy page. (STORM-5024) August 23, 2012 Input source type menu issue in IE8 resolved. (STORM-4983) Fixed issue with 503 errors when viewing dashboards. (STORM-4866) Issue with renaming projects is resolved. (STORM-5016) Migrate some production instances to new hardware. (STORM-4963) Improve internal logging for user plan change actions. (STORM-5034) Minor fix to login dialog box. (STORM-4978) August 15, 2012 Various display issues in IE9 resolved. (STORM-4980, STORM-4979 Fix data storage plan slider. (STORM-4976) New Terms of Service agreement displayed. (STORM-4704, STORM-4851) Number of free projects changed to 1. (STORM-4696) REST API access removed pending further development. (STORM-4936) Remove "Beta" from the help link urls. (STORM-4737) Change minimum retention policy to 30 days. (STORM-4967)
10
Deleted project member correctly removed from list of members. (STORM-4924) August 3, 2012 Links to relevant documentation added to API input page. (STORM-4795, STORM-4852) Issue with project creation UI breaking in Chrome has been resolved. (STORM-4701) "DatabaseError: could not obtain lock on row" error when clicking Explore has been resolved. (STORM-4802, STORM-4500) July 12, 2012 New supported source types: mail_nodate, syslog_nohost, mysql_slow. (STORM-4393, STORM-4395, STORM-4404, STORM-4416) Visual UI fixes. (STORM-4402, STORM-4322, STORM-2714, STORM-3676, STORM-4185) A bug producing 404 errors with long periods of user inactivity has been fixed. (STORM-4398) June 27, 2012 Bug with changing project plans is fixed. (STORM-4164) Visual UI bugs fixed. (STORM-3203, STORM-4092, STORM-4312, STORM-2714, STORM-3101) Explore data button bugs fixed. (STORM-4002, STORM-4100, STORM-4144) 6/21/2012 Storm now correctly indexes Windows event logs sent with a Storm universal forwarder. (STORM-4139) 6/19/2012 Fixed some bugs with "Explore data" button. (STORM-4300, STORM-4147, STORM-4002) If an account develops a payment issue, a warning message will appear on all the input pages. (STORM-4210) Improved no-JS error messaging and site options. (STORM-4043) Visual UI bugfixes. (STORM-4134, STORM-4154, STORM-3513, STORM-4163, 4075, 4154)
11
6/11/2012 Save search, then share, now generates a correct link. (SPL-52088) 6/5/2012 Visual UI updates. (STORM-4208, STORM-3935, STORM-4061, STORM-3279) With the Storm predefined syslog source type, the number of characters Storm looks into the event for a time stamp has been increased to 100 characters. (STORM-3552) Help links from UI into docs improved. (STORM-4119) Improved DNS error wording in UI. (STORM-3871) Fixed python error with sending data through the API input endpoint. (STORM-4172) Changes to the project timezone now propagate more quickly to the default timezone for any events with no explicit timezone themselves. (STORM-4014) 5/23/2012 Fixed bug with project creation. (STORM-4109) Many updates to public web site. (STORM-3724, STORM-2723, STORM-3698, STORM-3697) Fixed bug with help links to documentation. (STORM-3878) 5/17/2012 Visual improvements to UI. (STORM-3868, STORM-4013) Fixed bug about incorrect Twitter source type. (STORM-4016) Fixed bug with editing profile email address. (STORM-3397) Improvements to data storage graphs. (STORM-3167, STORM-3791) 5/10/2012 Performance improvements to data inputs. (STORM-3702, STORM-3564) Bug showing incorrect time zone for scheduled plan downgrades has been fixed. (STORM-3255) 5/9/2012 Introduced new predefined source types: Facebook, Foursquare, Google+, Twitter, and three for JSON data. (STORM-3454)
12
Trying to file a Support ticket when you're not logged in now takes you to a login screen (then to Support page), instead of 404. (STORM-3845) A bug causing some scheduled search jobs to run twice has been fixed. (STORM-3826) Improved display and accuracy of storage management pages. (STORM-3795, STORM-3791, STORM-3171, STORM-3745) Updated wording around search result sharing in Splunk Web. (STORM-3538) Fixed display issues in Splunk Web Jobs menu. (STORM-3469) Fixed bugs in billing integration. (STORM-3797, STORM-3802) 4/23/2012 A bug causing data from a file upload to intermittently not appear has been fixed. (STORM-3751) Some intermittent bugs in project upgrading have been fixed. (STORM-3752, STORM-3565, STORM-3284) UI now shows a progress message when a user is deleting data. (STORM-3645) Several visual UI bugs fixed. (STORM-3680, STORM-2609, STORM-3754, STORM-3582, STORM-3400, 3578) Two bugs with the "save and share a search" popup window have been fixed. (STORM-3679, STORM-3678) A bug that removed the auto authorization link when auto authorization time ran out has been fixed. (STORM-3671) A bug preventing edits to _tzhint in a universal forwarder's inputs.conf file from taking effect has been fixed. (STORM-3674) 4/12/12 Storm now supports forwarding, with the Splunk universal forwarder. (STORM-1677, STORM-2069, STORM-2784, STORM-3404, STORM-3584) A bug preventing a user from regenerating an API user token has been fixed. (STORM-3037) Data storage graph has been updated. (STORM-3190, STORM-3401) A bug preventing a non-admin user from leaving a project has been fixed. (STORM-3492) Wording changed in adding network data workflow. (STORM-3496, STORM-3497) Login page visual alignment fixed. (STORM-3562) A bug showing the "store data indefinitely" button on the deletion policy page incorrectly has been fixed. (STORM-3541)
13
File upload is no longer delayed when multiple users are uploading files simultaneously. (STORM-3396) Timing of project downgrades has been fixed. (STORM-3405, STORM-3463) 3/29/12 A bug causing uploaded files to intermittently take a very long time to appear in Storm has been fixed. (STORM-3477) A usability bug about a button on the new-user signup page has been fixed. (STORM-3249) Cosmetic changes to the UI. (STORM-3476, 3473, 3495, 3467, 3461, 3438, 3416, 3414, 3410, 3407, 3403, 3395, 3357, 3298, 3297, 3277, 3409) Cosmetic changes to billing invoices. (STORM-3441, 3398, 3297) A bug hiding Storm banner messages has been fixed. (STORM-3490) Data storage graph y-axis and caption corrected. (STORM-3422, 3294) A bug breaking the "create project" workflow when an error occurs has been fixed. (STORM-3372) "Input methods" page in the "add data" workflow has been removed. (STORM-3412) 3/20/2012 All invited projects now listed under "Others" on Projects page (as opposed to being listed individually by project owner). (STORM-3375) Projects page no longer shows error message when a project hasn't received data in the past 24 hours. (STORM-3376) Project creation/upgrade page has more information about local taxes and the timing of a charge. (STORM-3380) Projects can now be as large as 1 TB. (STORM-3322) A user's Account page now shows dollar amounts to two decimal places, and does not show invoices for plans costing $0. (STORM-3252, STORM-3253) Assorted typos fixed and cosmetic improvements in UI. (STORM-3141, STORM-3328, STORM-3352, STORM-3042) Send activation email page has been simplified. (STORM-3367, STORM-2720) Billing emails updated. (STORM-3305, STORM-3021) A bug preventing a user from creating more than two projects has been fixed. (STORM-3300) The REST API input endpoint's SSL certification has been fixed. (STORM-3309)
14
REST API input endpoint was returning an extra error code (403) when Storm returned a 500; now returns only a 503. (STORM-3265) 3/6/2012 Billing is now integrated into Storm. While Storm is still in Beta, any invoice you receive should be for $0. A user can now input data to Storm using a public REST API endpoint, using basic authentication. Storm now sends alerts when a user is about to max out their plan. Storm now has two graphs to help users track the amount of data being indexed and how much space they have left in their plan. Storm has a new UI across all features, and many new or redesigned pages throughout the product. Users can file a case with Support. 2/8/2012 A user can now have up to four projects instead of two. Note that the number of allowed free projects (and the size of free projects) will change post-Beta. 1/24/2012 All users now have deletion policies, which by default store data infinitely. Check your data plan and usage and upgrade your plan (still free for Beta users) if you need it. Read about deleting data in "Delete data from a project" in this manual. A user can now upgrade (STORM-2616) or downgrade (STORM-2643, STORM-2617) a data plan (STORM-2550, STORM-2548, STORM-1948). A user can now choose the amount of time to store data (deletion policy; STORM-2527, STORM-1948, STORM-2677). Data will be deleted if it's older than the time you set in your retention policy. Read about deletion policies in "Choose a data storage plan" in this manual. Users can now see how much data they have sent in the past hour, day, or week. (STORM-2528, STORM-2002, STORM-2001, STORM-1973) Storm has changed the way it counts data. It now counts the uncompressed data, before it's indexed. The workflow for creating a dashboard from a saved search has been improved. (STORM-2597)
15
12/23/2011 Fixed link to support forum. (STORM-2610) In Splunk Web, fixed inconsistencies on the "Define report content > Format report" page. (STORM-2452) Fixed "Open" link on the "Dashboards" page. (STORM-2449) 12/22/2011 A bug in search that prevented keywords without wildcards from being found has been fixed. (STORM-2535) 12/20/2011 Now when a user enters the Storm UI, they are told which project they're in. (STORM-1552) A bug showing default event types in Manager has been fixed. (STORM-2433) 12/13/2011 Users can now send invites for Storm Beta. See "Share your project" in this manual for more information. 12/12/2011 A bug concatenating multiple syslog events into one event, present only with data sent over UDP, has been fixed. (STORM-2518) 12/5/2011 Storm has been upgraded to Splunk 4.3. (STORM-2276) Restart your browser and/or do a shift + reload in your browser for the UI changes to take effect. Storm backup process can now handle Amazon SimpleDB going down. (STORM-1278) A bug preventing Storm users from saving searches has been fixed. (STORM-2288) An Apache log event concatenation bug has been fixed. (STORM-2285) If external network conditions create a long transmission delay (e.g., due to packet loss), users might have seen a log line being split into two events. This has been made far less likely. (STORM-2247)
16
A bug creating unnecessary user status errors in splunkd.log has been fixed. (STORM-2377) The upload file progress bar now displays correctly in Chrome. (STORM-2244) A user invited to a Storm project now has correct permissions. (STORM-1392) Input proxy logging and connectivity has been improved. (STORM-2318) 9/22/2011 "Leave project" now works correctly. (STORM-1815, STORM-1814, STORM-1803) There is now filename validation when uploading a file. (STORM-1752) 9/14/2011 Display issues on the Data Inputs page have been fixed (STORM-1248, STORM-1246) Display issues on IE7 with custom source type box and form elements have been fixed. (STORM-1236, STORM-1237, STORM-1238) File upload form now defaults to the Project?s timezone. (STORM-1253) The correct error is now displayed when deleting a Project. (STORM-1501, STORM-1474)
17
Splunk concepts
Inputs and projects
The first step in using Splunk Storm is to send it data. Once Storm gets some data, it immediately indexes it and makes it immediately available for searching. Storm turns your data into a series of individual events, consisting of searchable fields. There's lots you can do to massage the data before and after Splunk indexes it, but you don't usually need to. In most cases, Storm can determine what type of data you're feeding it and handle it appropriately. Basically, you send Storm data and it does the rest. In moments, you can start searching the data and use it to create charts, reports, alerts, and other interesting outputs.
source - The source of an event is the name of the file, stream, or other input from which the event originates. source type - The source type of an event is the format of the data input from which it originates, such as ruby_on_rails or log4j. Use a source type to categorize your data, and refer to those categories when creating searches, field extractions, tags, or event types. To change the source type for data you've already sent to Storm, you'll need to assign the data the corrected source type in your data input method and then resend the data. That is, you cannot edit source type (or any default field) in your data once Storm has already indexed it.
Internal name
Expected syntax
No
apache_error
Standard Apache web server er 12:17:35 2005] [error] [ does not exist: /home/reba/public_html/i
Standard NCSA combined form (generated by Apache or other w - webdev [08/Aug/2005:13 HTTP/1.0" 200 0442 "-" " (nagios-plugins 1.4)" Looks for a timestamp of format %d %b, %Y %H:%M:%S %p.
catalina csv
For Java Catalina logs. Checks event for a timestamp. Breaks a Merges lines. comma-separated values
Checks the first 15 characters o Breaks line only before a timesta Checks the first 500 characters Does not merge lines.
19
with created_time":" foursquare foursquare Looks for a timestamp prefixed with created": Expects each event to be a single line that begins with an ISO 8601 timestamp of the format %Y-%m-%dT%H:%M:%S.%3N (for
Checks the first 300 characters Does not merge lines. Does not
generic_single_line
example,
2011-08-10T04:20:55.432) Google+ IIS googleplus iis Looks for a timestamp prefixed with published":" Checks the first 800 characters Does not merge lines.
Breaks line only before an open JSON (pre-defined json_predefined_timestamp timestamp) Looks for a timestamp prefixed with timestamp":" of format %Y-%m-%dT%H:%M:%S.%3N
example :
{ "timestamp":"2013-10-24 "otherfield":"value" }
json_auto_timestamp
json_no_timestamp
Does not look for a timestamp in the time that Storm receives (an the project time zone. Breaks lin brace, {
Log4j
log4j
Works for Log4j standard output using log4j. Ex.: 2005-03-07 1 [PoolThread-0] INFO [STD property...
Log4php
log4php
Expects multiline events that each begin with an ISO 8601 timestamp of the format %Y-%m-%dT%H:%M:%S.%3N (for Breaks line before a new timesta
example,
2011-08-10T04:20:55.432) Looks for a timestamp of format Time: %y%m%d %k:%M:%S
Mysql_slow
mysql_slow
For mysql slow query logs (mys only before a timestamp. Merge
MAX_EVENTS = 512
Ruby on Rails ruby_on_rails Expects events that each start with the string "Processing" and
20
contain a timestamp of the format %Y-%m-%d %H:%M:%S. Looks for a timestamp of format %b %d %H:%M:%S.
Syslog
syslog
Works for all data coming from s Checks the first 32 characters o Expects the host to be the word merge lines.
Syslog no host
syslog_nohost
Looks for a timestamp of format %b %d %H:%M:%S Looks for a timestamp of the format %Y-%m-%dT%H:%M:%S.%3NZ
Classic syslog format, except th the event; instead the host come server IP, forwarder, or inputs.co the data is sent).
prefixed with
created_at":" custom source type
Choose your own source type n before a timestamp. If it can't fin it as multiline data.
Learn more
To learn more about how source types work, read "Why source types matter" in the core Splunk product documentation. Note that the core documentation refers to some features that are not available in Splunk Storm.
21
System operators can use information from debugging logs to identify trends in system stability and performance. Semantic logging Semantic logging entails purposefully logging specific data that exposes the state of business processes (web clicks, financial trades, cell phone connections and dropped calls, audit trails, and so on). Strategic planners within your organization can leverage the power of "semantic" level events to create dashboards and presentations which reflect how the business is performing (as opposed to the application)
public boolean submitPurchase(int purchaseId) { // Initialize "gold mine" logging fields to defaults String login = "unknown"; int cid = 0; float total = 0.0; try { // Debug logging entry point log.debug("action=submitPurchaseStart, purchaseId=%d", purchaseId); // Retrieve "gold mine" information for semantic logging later PurchaseInfo purchase = getPurchaseInfo(purchaseId); total = purchase.getTotal(); Customer customer = purchase.getCustomer(); cid = customer.getId(); login = customer.getLogin(); submitToCreditCard(purchaseId); generateInvoice(purchaseId); generateFullfillmentOrder(purchaseId);
23
// Semantic logging for revenue dashboards log.info("action=PurchaseCompleted, purchaseId=%d, customerId=%d, login=%s, total=%.2f", purchaseId, cid, login, total); // Debug logging exit point log.debug("action=submitPurchaseCompleted, purchaseId=%d", purchaseId); return true; } catch ( Exception ex ) { // Exception logging at the public interface level log.exception("action=submitPurchaseFailed, purchaseId=%d, error=%s", purchaseId, ex.getMessage()); // Semantic logging for failures dashboard log.info("action=PurchaseFailed, purchaseId=%d, customerId=%d, login=%s, total=%.2f, error=%s", purchaseId, cid, login, total, ex.getMessage()); return false; } } private void submitToCreditCard(int purchaseId) throws PurchaseException { try { // Debug logging entry point log.debug("action=submitToCreditCardStart, purchaseId=%d", purchaseId); // Submit transaction to credit card gateway // ... // Debug logging exit point log.debug("action=submitToCreditCardComplete, purchaseId=%d", purchaseId); } catch ( GatewayException ex ) { // Error logging at the private method level log.error("action=submitToCreditCardFailed, purchaseId=%d, error=%s", purchaseId, ex.getMessage()); throw new PurchaseException(ex.getMessage(), ex); } }
Here's a quick breakdown of the reasoning behind the patterns presented above:
24
For debugging 1. All methods log methodNameStart before doing anything. This gives a live stack trace in the logs as all sub operations are executed. 2. All methods log methodNameComplete before exiting in a successful state. 3. All methods log methodNameFailed before exiting from an expected failure state. 4. All private methods handle only the specific types of exceptions they expect to occur and log at error level (no catch alls, no stack traces are logged). 5. The top-level (public) entry point catches ALL types of exceptions and logs the stacktrace by using the log.exception() method. The debug logs contain live entry and exit logs for each level of the stack. This can be used for live profiling of the application. Execution times at every level of the stack can be graphed and used to find bottlenecks. (Since all your log lines contain timestamps don't they!) ONLY the public interface method logs at exception level. Therefore only one stack trace will be logged per error. Have you ever seen a log file with a stack trace logged twice? Logging the stack more than once makes it ambiguous as to how many errors actually occurred. System operators can use this data to: Find out how long purchases take during different times of the day and days of the week. Find out how long purchases are taking now compared to a previous period Find out how many purchases are failing, and graph these failures over time. Summarize error messages and group according to type to identify which systems are the most unstable. For business analysis 1. The public interface method (business transaction) logs a single log line after the purchase completes successfully or fails. 2. Semantic logging tries to include extra "gold mine" information about the specific purchase and the customer. These fields are initially set to default values and the real values are filled in as they are obtained. In this way, in the case of a failure the values are logged if they were retrieved but the defaults are
25
logged if the error occurred while retrieving them. Revenue total is logged for graphing. Customer login details are added for customer support troubleshooting or proactive follow up. Business operators can use this data to: Graph purchase volume over time (day, week, month, quarter, year). Group purchases by customer to find out who the most valuable customers are and reward them for their loyalty. Group purchases by customer to find out who hasn't come back for repeat purchases and contact them to find out how you could serve them better. Graph purchase revenue over time. Graph purchase failures over time. Find specific purchase failures. To help a customer who calls in about a failure. To proactively contact customers by other means to complete purchases that would otherwise have been lost to a competitor. Each of these items is a simple search in Storm that you can save, schedule, and share by email. And there is no need to architect a complex solution--the more data you log, the more you get out of it. It's really that simple. Use Storm instead of complex and inflexible architectures like RDBMS/SQL.
Conclusion
It is important to plan your logging in a way to get the maximum benefit from your application. Start by thinking about how other people will view your logs. Events might get separated from the logs. Certain events might get copied and passed along to a variety of people inside and outside of an IT organization. Without using robust logging techniques, you can lose meaningful data, making it harder to derive insight from the logs. The logging best practices that follow can help make your debugging logs more relevant and powerful for you and others.
About logging
Splunk Storm can index any text data. There are things you can do, however, to help Splunk extract more information, and provide more value, from your data. Getting developers to change the way they do things is like herding cats. But if you want to save a huge amount of time and get the most out of your logging,
26
heed these tips. If you scour the web, you can find many "logging best practices" documents. Most of them are language specific, discuss performance implications, and explain event levels like DEBUG, INFO, etc. The purpose of this document is not to duplicate existing efforts, but to show you how to log effectively so you use robust analytics in Splunk Storm. Storm can reveal a tremendous amount of actionable information from within your logs, especially if you follow these guidelines.
One of the most powerful features of Storm is its ability to extract fields from events at search time. This is how Storm creates structure out of unstructured event data. Events can be created in such a way to guarantee that the field extractions will work as intended. Simply use the following string syntax (spaces or commas are fine):
key=value,key2=value2,key3=value...
If your values contain spaces, wrap them in quotes (for example, username="bob smith"). This may be a bit more verbose than you are used to, but the automatic field extraction will be well worth the size difference. Take the two following events as an example: Log.debug("error user %d", userId) Log.debug("orderstatus=error user=%d", userId) Searching for the word "error" will probably bring up many types of events, but searching for "orderstatus=error" will retrieve the precise events of interest. Additionally, you can then query Splunk Storm for reports that use orderstatus, such as asking for the distribution of orderstatus (for example, completed=78%, aborted=21%, error=1%) ? something you couldn't do, if you only had the keyword "error" in your log event. Things like transaction IDs, user IDs, and so on are tremendously helpful when debugging, and even more helpful when gathering analytics. Unique IDs can point you to the exact transaction. Without them, many times all you have to go on is a time range. Don?t change the format of these ids between modules if you can help it. If you keep them the same, then a transaction can be tracked through the system, or multiple ones if that?s the case. You can track transactions with multiple ids as well, as long as there remains a transitive connection between them. For instance: Event A contains ID 12345 Event B contains ID 12345 and UID ABCDE Event C contains UID ABCDE You can associate A with B with C because there is closure between the two IDs in event B.
28
Learn more
For more information about writing log files that can easily produce intelligence with Storm, read the three topics about conscious logging in this manual.
Best practices
You can help Storm get more out of your logs by following these best practices.
If your values contain spaces, wrap them in quotes (for example, username="bob smith"). This might be a bit more verbose than you are used to, but the automatic field extraction is worth the size difference. Create human-readable events Avoid using complex encoding that would require lookups to make event information intelligible. Use human-readable timestamps for every event The correct time is critical to understanding the proper sequence of events. Timestamps are critical for debugging, analytics, and deriving transactions. Storm will automatically time stamp events that don't include them using a number of techniques, but it's best that you do it. Use the most verbose time granularity possible.
29
Put the time stamp at the beginning of the line--the farther you place a time stamp from the beginning, the more difficult it is to tell it's a time stamp and not other data. Include a time zone, preferably a GMT/UTC offset. Time should be rendered in microseconds in each event. The event could become detached from its original source file at some point, so having the most accurate data about an event is ideal. Use unique identifiers (IDs) Unique identifiers such as transaction IDs and user IDs are tremendously helpful when debugging, and even more helpful when you are gathering analytics. Unique IDs can point you to the exact transaction. Without them, you might only have a time range to use. When possible, carry these IDs through multiple touch points and avoid changing the format of these IDs between modules. That way, you can track transactions through the system and follow them across machines, networks, and services. You might also find it helpful to give this ID a consistent name (something more descriptive than "ID"). Log in text format Avoid logging binary information because Storm cannot meaningfully search or analyze binary data. Binary logs might seem preferable because they are compressed, but this data requires decoding and won't segment. If you must log binary data, place textual meta-data in the event so that you can still search through it. For example, don't log the binary data of a JPG file, but do log its image size, creation tool, username, camera, GPS location, and so on. Avoid using XML and JSON Avoid formats with multidepth nesting because they aren't human-readable and require more work to parse. Occasional XML is fine for dumping the value of something that exists in your code, but don't make a habit out of it. Log more than just debugging events Put semantic meaning in events to get more out of your data. Log audit trails, what users are doing, transactions, timing information, and so on. Log anything that can add value when aggregated, charted, or further analyzed. In other words, log anything that is interesting to the business.
30
Use categories For example, use DEBUG, INFO, WARN, ERROR, EXCEPTION. Define a specific case when each level should be used. e.g: DEBUG level for application debugging INFO level for symantic logging WARN level for recoverable errors or automatic retry situations ERROR level for errors that are reported but not handled. EXCEPTION level for errors that are safely handled by the system Keep multi-line events to a minimum Multiline events generate a lot of segments, which can affect indexing and search speed, as well as disk compression. Consider breaking multiline events into separate events.
Examples
31
In this improved version, the event is easier to parse because the key-value pairs are clearly provided. Searching on "orderstatus=error" will retrieve exactly the events you want. Also, you can query Splunk for reports that use orderstatus, such as requesting its distribution (e.g., completed=78%, aborted=21%, error=1%), which is something you couldn't do if you only had the keyword "error" in your log event.
GOOD: Log.debug("orderstatus=error, errorcode=454, user=%d, transactionid=%s", userId, transId)
This improved version breaks multivalue information into separate events, so the key-value pairs are more clear:
GOOD: <TS> phonenumber=333-444-4444, app=angrybirds, installdate=xx/xx/xx <TS> phonenumber=333-444-4444, app=facebook, installdate=yy/yy/yy
32
You can associate Event A with Events B and C, because of the connection between the two IDs in Event B.
33
Storm Tutorial
Welcome to the Storm tutorial
This tutorial walks you through some of the main functionality of the Splunk Storm interface, and gets you started with searching and creating reports. This tutorial uses the same sample dataset and general use cases as the Splunk user tutorial in the Splunk core product documentation, but has been adapted for the Storm interface and feature set.
3. Specify the time zone for your project. The time zone you set here is applied to all the data that you send to this project. 4. Click Continue. The "Choose a plan" panel is displayed. 5. For the purposes of the tutorial, choose the Free plan, which is preselected. Refer to "Choose a data storage plan" for more information about data plans and how to choose them. 6. Click Continue. The Confirmation panel is displayed. 7. Ensure the details are correct and click Confirm. After a moment, the project data page for your project is displayed, with the Inputs tab selected. Next, we'll download the tutorial data set and upload it to your project.
35
36
8. Click Upload. 9. Repeat steps 3-8 for each of the files. There's one in each of the three Apache folders, and one in the MySQL folder. Next, we'll check out what the Storm user interface looks like once you've added some data.
How many people visited the site? How many bought something today? What is the most popular item that is purchased each day? Storm already has the data in it--let's take a look at it.
Click Explore data (it might take a few minutes for the button to appear). The Storm Home dashboard is displayed.
Storm Home
The Storm Home dashboard displays information about the data that you just uploaded to Storm and gives you the means to start searching this data.
What's in this dashboard? Storm includes many different dashboards and views. For now, you really only need to know about two of them: Home, where you are now. Search, where you will do most of your searching. Use the Search navigation menus to locate and access the different views in Storm. When you click on the links, Storm takes you to the respective dashboards or refreshes the page if you're already there. Other things in the Storm UI: Searches & Reports: lists all of your saved searches and reports. Search bar and Time range picker: enables you to type in your search and select different time ranges over which to retrieve events.
38
Sources panel: displays the top sources from the data in your Storm project. Sourcetypes panel: displays the top source typesin your Storm project. Hosts: displays the top hosts in your Storm project. If you're using a new, empty project for this tutorial, you'll only see the sample data files that you just uploaded. Because it's a one-time upload of a file, this data will not change. When you add more data, there will be more information on this dashboard. If you add data inputs that point to sources that are not static (such as log files that are being written to by applications), the numbers on the Home page will change as more data comes in from your source(s).
In the Sources panel, you should see the Apache Web server logs for the online Flower & Gift shop data that you just uploaded. If you're familiar with Apache Web server logs, you might recognize the access_combined source type as one of the log formats associated with Web access logs. All the data for this source type should give you information about people who access the Flower & Gift shop website. Searching in Storm is very interactive. Although you have a search bar in the Home dashboard, you don't need to type anything into it just yet. Each of the sources, source types, and hosts listed in the Home dashboard is a link that will kick off a search when you click on them. 2. In the Sourcetypes panel, click access_combined. Splunk takes you to the Search dashboard, where it runs the search and shows you the results:
39
There are a lot of components to this view, so let's take a look at them before continuing to search.
Storm paused my search?
If you are searching in a Storm project that has more data on it than just this tutorial's sample data, your search might take a bit longer. If your search takes longer than 30 seconds, Storm will automatically pause it. If autopause pops up, click Resume search. You can read more about autopause in the core Splunk platform Knowledge Manager Manual. What's in this Search dashboard? The search bar and time range picker should be familiar to you -- it was also in the Home dashboard. But, now you also see a count of events, the timeline, the fields menu, and the list of retrieved events or search results. Search actions: Use these buttons to save a search, create a report or dashboard, export results, print, and more. Count of matching and scanned events: As the search runs, Storm displays two running counts of the events as it retrieves them: one is a matching event count and the other is the count of events scanned. When the search completes, the count that appears above the timeline displays the total number of matching events. The count that appears below the timeline and above the events list, tells you the number of events during the time range that you selected. As we'll see later, this number changes when you drill down into your investigations. Timeline of events: The timeline is a visual representation of the number of events that occur at each point in time. As the timeline updates with
40
your search results, you might notice clusters or patterns of bars. The height of each bar indicates the count of events. Peaks or valleys in the timeline can indicate spikes in activity or server downtime. Thus, the timeline is useful for highlighting patterns of events or investigating peaks and lows in event activity. The timeline options are located above the timeline. You can zoom in, zoom out, and change the scale of the chart. Fields sidebar: We mentioned before that when you index data, Storm by default automatically recognizes and extracts information from your data that is formatted as name and value pairs, which we call fields. When you run a search, Storm lists all of the fields it recognizes in the Fields menu next to your search results. You can select other fields to show in your events. Selected fields are fields that are set to be visible in your search results. By default, host, source, and source type are shown. Interesting fields are other fields that Storm has extracted from your search results. Field discovery is an on/off switch at the top of the Fields menu. Storm's default setting is Field discovery on. If you want to speed up your search, you can turn Field discovery off, and Storm will extract only the fields required to complete your search. Results area: The results area displays the events that Storm retrieves to match your search. It's located below the timeline. By default the events are displayed as a list, but you can also choose to view them as a table. When you select the event table view, you will only see the Selected fields in your table. When you're ready, proceed to the next topic to start searching and find out what's up at the flower shop.
Start searching
This topic walks you through simple searches using the Search interface. If you're not familiar with the search interface, go back to the introduction to the Storm UI before proceeding. It's your first day of work with the Customer Support team for the online Flower & Gift shop. You're just starting to dig into the Web access logs for the shop, when you receive a call from a customer who complains about trouble buying a gift for his girlfriend--he keeps hitting a server error when he tries to complete a purchase. He gives you his IP address, 10.2.1.44.
41
As you type into the search bar, Splunk's search assistant opens. Search assistant shows you typeahead, or contextual matches and completions for each keyword as you type it into the search bar. These contextual matches are based on what's in your data. The entries under matching terms update as you continue to type because the possible completions for your term changes as well. Search assistant also displays the number of matches for the search term. This number gives you an idea of how many search results Splunk will return. If a term or phrase doesn't exist in your data, you won't see it listed in search assistant. What else do you see in search assistant? For now, ignore everything on the right panel next to the contextual help. Search assistant has more uses once you start learning the search language, as you'll see later. And, if you don't want search assistant to open, click "turn off auto-open" and close the window using the double up-arrow below the search bar.
42
Each time you run a search, Storm highlights in the search results what you typed into the search bar. 3. Skim through the search results. You should recognize words and phrases in the events that relate to the online shop (flower, product, purchase, etc.).
The customer mentioned that he was in the middle of purchasing a gift, so lets see what we find by searching for "purchase". 4. Type purchase into the search bar and run the search:
sourcetype=access_combined 10.2.1.44 purchase
When you search for keywords, your search is not case-sensitive and Storm retrieves the events that contain those keywords anywhere in the raw text of the event's data.
43
Among the results that Splunk retrieves are events that show each time the customer tried to buy something from the online store. Looks like he's been busy!
5. Use the Boolean NOT operator to quickly remove all of these Successful page requests. Type in:
sourcetype=access_combined 10.2.1.44 purchase NOT 200
You notice that the customer is getting HTTP server (503) and client (404) errors.
But, he specifically mentioned a server error, so you want to quickly remove events that are irrelevant.
44
Storm supports the Boolean operators: AND, OR, and NOT. When you include Boolean expressions in your search, the operators have to be capitalized.
The AND operator is always implied between search terms. So the search in Step 5 is the same as:
sourcetype=access_combined AND 10.2.1.44 AND purchase NOT 200
Another way to add Boolean clauses quickly and interactively to your search is to use your search results. 6. Mouse over an instance of "404" in your search results and ALT-click it. This updates your search string with "NOT 404" and filters out all the events that contain the term.
From these results, you see each time that the customer attempted to complete a purchase and received the server error. Now that you have confirmed what the customer reported, you can continue to drill down to find the root cause.
Interactive searching
45
Storm lets you highlight and select any segment from within your search results to add, remove, and exclude them quickly and interactively using your keyboard and mouse: To add more search terms, highlight and click the word or phrase you want from your search results. To remove a term from your search, click a highlighted instance of that word or phrase in your search results. To exclude events from your search results, ALT-click on the term you don't want Storm to match. When you're ready to proceed, go to the next topic to learn how to investigate and troubleshoot interactively using the timeline in Storm.
In the last topic, you really just focused on the search results listed in the events viewer area of this dashboard. Now, let's take a look at the timeline.
46
The location of each bar on the timeline corresponds to an instance when the events that match your search occurred. If there are no bars at a time period, no events were found then. 2. Mouse over one of the bars. A tooltip pops up and displays the number of events that Splunk found during the time span of that bar (1 bar = 1 hr).
The taller the bar, the more events occurred at that time. Often seeing spikes in the number of events or no events is a good indication that something has happened. 3. Click one of the bars, for example the tallest bar. This updates your search results to show you only the events at the time span. Splunk does not run the search when you click on the bar. Instead, it gives you a preview of the results zoomed-in at the time range. You can still select other bars at this point.
47
4. Double-click on the same bar. Splunk re-runs your search to retrieve only events during that one hour span you selected.
You should see the same search results in the Event viewer, but, notice that the search overrides the time range picker and it now shows "Custom time". (You'll see more of the time range picker later.) Also, each bar now represents one minute of time (1 bar = 1 min). One hour is still a wide time period to search, so let's narrow the search down more. 5. Double-click another bar. Once again, this updates your search to now retrieve events during that one minute span of time. Each bar represents the number of events for one second of time.
48
Now, you want to expand your search to see everything else, if anything happened during this minute. 6. Without changing the time range, replace your previous search in the search bar with:
*
Splunk supports using the asterisk (*) wildcard to search for "all" or to retrieve events based on parts of a keyword. Up to now, you've just searched for Web access logs. This search tells Splunk that you want to see everything that occurred at this time range:
This search returns events from all the logs on your server. You expect to see other user's Web activity--perhaps from different hosts. But instead you see a cluster of mySQL database errors. These errors were causing your customer's purchases to fail. Now, you can report this issue to someone in the IT Operations team.
To show all the results for the timeline again, click deselect above the timeline. To lock-in the selected span of events to your search, click zoom in.
49
To expand the timeline view to show more events, click zoom out. When you're ready, proceed to the next topic to learn about searching over different time ranges.
Also, this search uses the wildcarded shortcut, "access_*", to match the Web access logs. If you have different source types for your Apache server logs, such as access_common and access_combined, this will match them all. This searches for general errors in your event data over the course of the last week. Instead of matching just one type of log, this searches across all the logs in your index. It matches any occurrence of the words "error", "failed", or "severe" in your event data. Additionally, if the log is a Web access log, it looks for HTTP error codes, "404", "500", or "503".
50
This search returns a significant amount of errors. You're not interested in knowing what happened over All time, even if it's just the course of a week. You just got into work, so you want to know about more recent activity, such as overnight or the last hour. But, because of the limitations of this dataset, let's look at yesterday's errors. 2. Drop down the time range picker and change the time range to Other > Yesterday.
By default, Storm searches across all of your data; that is, the default time range for a search is across "All time". If you have a lot of data, searching on this time range when you're investigating an event that occurred 15 minutes ago, last night, or the previous week just means that Storm will take a long time to retrieve the results that you want to see.
51
3. Selecting a time range from this list automatically runs the search for you. If it doesn't, just hit Enter.
This search returns events for general errors across all your logs, not just Web access logs. (If your sample data file is more than a day old, you can still get these results by selecting Custom time and entering the last date for which you have data.) Scroll through the search results. There are more mySQL database errors and some 404 errors. You ask the intern to get you a cup of coffee while you contact the Web team about the 404 errors and the IT Operations team about the recurring server errors.
Storm also provides options for users to select to search a continuous stream of incoming events:
Real-time enables searching forward in time against a continuous stream of live incoming event data. Because the sample data is a one-time upload, running a real-time search will not give us any results right now. We will explore this option later. Read more about real-time searches and how to run them in "Search and report in real-time" in the core Splunk product documentation. Note that any references in the core Splunk product documentation to the CLI and configuration files are not relevant for Splunk Storm. For more information about your time range options, see Change the time range of your search" in the core Splunk product documentation. Up to now, you've run simple searches that matched the raw text in your events. You've only scratched the surface of what you can do in Storm. When you're ready to proceed, go on to the next topic to learn about fields and how to search
52
with fields.
What are fields? Fields are searchable name/value pairings in event data. All fields have names and can be searched with those names. Some examples of fields are clientip for IP addresses accessing your web server, _time for the timestamp of an event, and host for domain name of a server. Fields distinguish one event from another because not all events will have the same fields and field values. Fields enable you to write more tailored searches to retrieve the specific events that you want. Fields also enable you to take advantage of the search language, create charts, and build reports. Most fields in your data exist as name and value pairs where there is one single value to each field name. But, you'll also see fields that appear more than once
53
in an event and have a different value for each appearance. One of the more common examples of multivalue fields is email address fields. While the "From" field will contain only a single email address, the "To" and "Cc" fields may have one or more email addresses associated with them. For more information, read About fields in the Knowledge Manager manual in the core Splunk product documentation. Note that any references in the core Splunk product documentation to the CLI and configuration files are not relevant for Splunk Storm.
2. Scroll through the search results. If you're familiar with the access_combined format of Apache logs, you will recognize some of the information in each event, such as: IP addresses for the users accessing the website. URIs and URLs for the page request and referring page. HTTP status codes for each page request. Page request methods.
As Storm retrieves these events, the Fields menu updates with selected fields and interesting fields. These are the fields that Storm extracted from your data.
Storm extracts fields from event data twice. It extracts default and other indexed fields during event processing when that data is indexed. And it extracts a different set of fields at search time, when you run a search. Read more about "Index time versus search time" in the Managing Indexers and Clusters manual in the core Splunk product documentation. Note that any references in the core Splunk product documentation to the CLI and configuration files are not relevant for Splunk Storm. At index time, Storm automatically finds and extracts default fields for each event it processes. These fields include host, source, and sourcetype (which you should already be familiar with). For a complete list of the default fields, see "Use default fields" in the User manual in the core Splunk product documentation. Storm also extracts certain fields at search time--when you run a search. You'll see some examples of these searches later. For more information, read the "Overview of search-time field extractions" in the Knowledge Manager manual in the core Splunk product documentation. Note that any references in the core Splunk product documentation to the CLI and configuration files are not relevant for Splunk Storm. Notice that default fields host, source, and source type are selected fields and are included in your search results:
3. Scroll through interesting fields to see what else Storm extracted. You should recognize the field names that apply to the web access logs. For example, there's clientip, method, and status. These are not default fields; they have (most likely) been extracted at search time. 4. Click Edit in the fields menu.
55
The Fields dialog opens and displays all the fields that Storm extracted. Available fields are the fields that Storm identified from the events in your current search (some of these fields were listed under Other interesting fields). Selected fields are the fields you picked (from the available fields) to show in your search results (by default, host, source, and sourcetype are selected).
5. Scroll through the list of Available fields. You're already familiar with the fields that Storm extracted from the web access logs based on your search. You should also see other default fields that Storm defined--some of these fields are based on each event's timestamp (everything beginning with date_*), punctuation (punct), and location (index). But, you should also notice other extracted fields that are related to the online store. For example, there are action, category_id, and product_id. From conversations with your coworker, you may know that these fields are:
56
6. From the Available Fields list, select action, category_id, and product_id.
7. Click Save. When you return to the Search view, the fields you selected will be included in your search results if they exist in that particular event. Different events will have different fields.
The fields menu doesn't just show you what fields Storm has captured from your data. It also displays how many values exist for each of these fields. For the fields you just selected, there are 2 for action, 5 for category_id, and 9 for product_id. This doesn't mean that these are all the values that exist for each of the fields--these are just the values that Storm knows about from the results of your search. What are some of these values? 8. Under selected fields, click action for the action field.
57
This window tells you that, in this set of search results, Storm found two values for action and they are purchase and update. Also, it tells you that the action field appears in 71% of your search results. This means that three-quarters of the web access events are related to the purchase of an item or an update (of the item quantity in the cart, perhaps). 9. Close this window and look at the other two fields you selected, category_id (what types of products the shop sells) and product_id (specific catalog names for products). Now you know a little bit more about the information in your data relating to the online Flower and Gift shop. Let's use these fields to see what people are buying. For example, the online shop sells a selection of flowers, gifts, plants, candy, and balloons.
58
Run this search again, but this time, use fields in your search.
To search for a particular field, just type the field name and value into the search bar: fieldname=fieldvalue
The HTTP error codes are values of the status field. Now your search looks like this:
error OR failed OR severe OR (sourcetype=access_* (status=404 OR status=500 OR status=503))
Notice the difference in the count of events between the two searches--because it's a more targeted search, the second search returns fewer events. When you run simple searches based on arbitrary keywords, Storm matches the raw text of your data. When you add fields to your search, Storm looks for events that have those specific field/value pairs.
59
Also, you were actually using fields all along! Each time you searched for sourcetype=access_*, you told Storm to only retrieve events from your web access logs and nothing else.
Example 2: Before you learned about the fields in your data, you might have run this search to see how many times flowers were purchased from the online shop:
sourcetype=access_* purchase flower*
As you typed in "flower", search assistant shows you both "flower" and "flowers' in the typeahead. Since you don't know which is the one you want, you use the wildcard to match both.
If you scroll through the (many) search results, you'll see that some of the events have action=update and category_id that have a value other than flowers. These are not events that you wanted! Run this search instead. Select Other > Yesterday from the time range picker:
sourcetype=access_* action=purchase category_id=flower*
For the second search, even though you still used the wildcarded word "flower*", there is only one value of category_id that it matches (FLOWERS). Notice the difference in the number of events that Storm retrieved for each search; the second search returns significantly fewer events. Searches with fields
60
are more targeted and retrieves more exact matches against your data. As you run more searches, you want to be able to save them and reuse them or share them with your teammates. When you're ready, proceed to the next topic to learn how to save your search and share it it with others.
Save a search
This topic assumes you're comfortable running searches with fields. If you're not, go back to the previous topic and review how to Use fields to search. This topic walks you through the basics of saving a search and how you can use that search again later. Back at the Flower & Gift shop, you just ran a search to see if there were any errors yesterday. This is a search you will run every morning. Rather than type it in manually every day, you decide to save this search. Example 1. Run the search for all errors seen yesterday:
error OR failed OR severe OR (sourcetype=access_* (status=404 OR status=500 OR status=503))
2. Select Save search... from the drop down list. The Save search dialog box opens.
61
At a minimum, a saved search includes the search string and the time range associated with the search, as well as the name of the search.
3. Name the search Errors (Yesterday) 4. Leave the Search string as it is. 5. Leave the Time range as it is. 6. Leave Share set as it is. 7. Click Finish. Storm confirms that your search was saved:
62
Because the saved search's name contained the word "Error," Storm lists it in the saved search submenu for Errors. Right now you are the only one that is authorized to access this saved search. Since this is a search that others on your team may want to run, you can set it as a shared saved search that they can access. To do this, read more about saving searches and sharing search results in the core Splunk product documentation. Note that any references in the core Splunk product documentation to the CLI and configuration files are not relevant for Splunk Storm.
Manage searches and reports If you want to modify a search that you saved, use the Searches & Reports menu to select Manage Searches & Reports. This takes you the Storm Manager page for all the searches and reports you're allowed to access (if you're allowed to access them). From here you can select your search from the list. This take you to the searches edit window where you can then change or update the search string, description, time range, and schedule options. Save and share search results Saving the results of a search is different from saving the search itself. You do this when you want to be able to review the outcome of a particular run of a search at a later time. Read more about this in saving searches and sharing search results in the core Splunk product documentation. Note that any references in the core Splunk product documentation to the CLI and configuration files are not relevant for Splunk Storm. Now, you can save your searches after you run them. When you're ready, proceed to the next topic to learn more ways to search.
index. In a previous topic, when you ran this search for purchases of flowers:
sourcetype=access_* action=purchase category_id=flowers
The search results told you approximately how many flowers were bought. But, this doesn't help you answer questions, such as: What items were purchased most at the online shop? How many customers bought flowers? How many flowers did each customer buy? To answer these questions, you need to use Splunk Storm's search language, which includes an extensive library of commands, arguments, and functions that enables you to filter, modify, reorder, and group your search results. For this tutorial you'll only use a few of them.
As you type in the search bar, search assistant opens with syntax and usage information for the search command (on the right side). If search assistant doesn't open, click the blue double arrow under the left side of the search bar.
You've seen before that search assistant displays typeahead for keywords that you type into the search bar. Search assistant shows you information the search command because each time you ran a search to now, you were using the search command--it's implied. It
64
also helps you construct your search string by suggesting other search commands you may want to use next (common next commands). 2. Type a pipe character into the search bar. This causes the search assistant to show you some common next commands.
The pipe indicates to Splunk that you want to take the results of the search to the left of the pipe and use that as the input to the command after the pipe. You can pass the results of one command into another command in a series, or pipeline, of search commands.
You want Storm to give you the most popular items bought at the online store--the top command looks promising.
and press Enter. This gives you a table of the top or most common values of category_id. By default, the top command returns ten values, but you only have five different types of items. So, you should see all five, sorted in descending order by the count of each type:
65
The top command also returns two new fields: count is the number of times each value of the field occurs, and percent is how large that count is compared to the total count. Read more about the top command in the Search reference manual in the core Splunk product documentation. Note that any references in the core Splunk product documentation to the CLI and configuration files are not relevant for Splunk Storm.
66
drilldown actions enable you to delve deeper into the details of the information presented to you in the tables and charts that result from your search. Read more about drilldown actions in the User manual in the core Splunk product documentation. Note that any references in the core Splunk product documentation to the CLI and configuration files are not relevant for Splunk Storm.
Splunk's
The number of events returned tells you how many times flowers were purchased, but you want to know how many different customers bought the flowers.
Example 3: How many different customers purchased the flowers? 1. You're looking specifically for the purchase of flowers, so continue with the search from the previous example:
sourcetype=access_* action=purchase category_id=flowers
The customers who access the Flower & Gift shop are distinguished by their IP addresses, which are values of the clientip field. 2. Use the stats command and the distinct_count() or dc() function:
sourcetype=access_* action=purchase category_id=flowers | stats dc(clientip) You piped the search results into the the stats command and used the distinct_count() function to count the number of unique clientip values
that it
This tells you that there were approximately 300 different people who bought flowers from the online shop. Example: 4 In the last example, you calculated how many different customers bought flowers. How do you find the number of flowers that each customer bought? 1. Use the stats command:
sourcetype=access_* action=purchase category_id=flowers | stats count
The count() function returns a single value, the count of your events. (This should match your result from Example 2.) Now, break this count down to see how many flowers each customer bought. 2. Add a by clause to the stats command:
sourcetype=access_* action=purchase category_id=flowers | stats count BY clientip
This search gives you a table of the different customers (clientip) and the number of flowers purchased (count).
68
an "AS" clause. If your new field name is a phrase, use double quotes. The syntax for the stats command doesn't allow field renaming in the "by" clause. 4. Use the rename command to change the clientip name:
sourcetype=access_* action=purchase category_id=flowers | stats count AS "# Flowers Purchased" by clientip | rename clientip AS Customer
This formats the table to rename the headers, clientip and count, with Customer and # Flowers purchased:
69
For more information about the stats command and its usage, arguments, and functions, see the stats command in the Search reference manual and the list of stats functions. For more information about the rename command, see the rename command in the Search reference manual in the core Splunk product documentation. Note that any references in the core Splunk product documentation to the CLI and configuration files are not relevant for Splunk Storm.
In this last search, you found how many flowers each customer to the online shop bought. But what if you were looking for the one customer who buys the most items on any given day? When you're ready, continue on to the next topic to learn another way to search, this time using subsearches.
Use a subsearch
The last topic introduced search commands, the search pipeline, and drilldown actions. If you're not familiar with them, review more ways to search. This topic walks you through another search example and shows you two approaches to getting the results that you want. Back at the Flower & Gift shop, your boss asks you to put together a report that shows the customer who bought the most items yesterday and what he or she bought.
Part 1: Break the search down. Let's see which customer accessed the online shop the most yesterday.
70
If you wanted to see more than one "top purchasing customer", you would change this limit value.
Now, use the clientip value to complete your search. 2. Use the stats command to count the VIP customer's purchases:
sourcetype=access_* action=purchase clientip=10.192.1.39 | stats count by clientip
This only returns the count of purchases for the clientip. You also want to know what he bought. 3. One way to do this is to use the values() function:
sourcetype=access_* action=purchase clientip=10.192.1.39 | stats count, values(product_id) by clientip
This adds a column to the table that lists what he bought by product ID.
71
The drawback to this approach is that you have to run two searches each time you want to build this table. The top purchaser is not likely to be the same person at any given time range.
A subsearch is a search with a search pipeline as an argument. Subsearches are contained in square brackets and evaluated first. The result of the subsearch is then used as an argument to the primary search. Read more about "How subsearches work" in the User manual in the core Splunk product documentation. Note that any references in the core Splunk product documentation to the CLI and configuration files are not relevant for Splunk Storm.
1. Use a subsearch to run the searches from Part 1 inline. Type or copy/paste in:
sourcetype=access_* action=purchase [search sourcetype=access_* action=purchase | top limit=1 clientip | table clientip] | stats count, values(product_id) by clientip Because the top command returns count and percent fields as well, you use the table command to keep only the clientip value.
These results should match the previous result, if you run it on the same time range. But, if you change the time range, you might see different results because the top purchasing customer will be different!
72
For more information about the usage and syntax for the sort command, see the sort command in the Search Reference manual in the core Splunk product documentation. Note that any references in the core Splunk product documentation to the CLI and configuration files are not relevant for Splunk Storm. When you're ready continue on to the next topic to review more search examples.
73
How many page views were there compared to the number of purchases made? What was purchased and how much was made? How many purchase attempts failed?
Example 1
How many times did someone view a page on the website, yesterday? 1. Start with a search for all page views. Select the time range, Other > Yesterday:
sourcetype=access_* method=GET
Next you want to count the number of page views (characterized by the method field). 2. Use the stats command:
sourcetype=access_* method=GET | stats count AS Views Here, you use the stats command's count() function to count
the number of "GET" events in your Web access logs. This is the total number of events returned by the search, so it should match the count of retrieved events. This search essentially captures that count and saves it into a field that you can use.
74
Here, renaming the count field as Views isn't necessary, but you're going to use it again later and this helps to avoid confusion. 3. Save this search as Pageviews (Yesterday).
Example 2
From Example 1, you have the total number of views. How many visitors who viewed the site purchased an item? What is the percentage difference between views and purchases? 1. Start with the search from Example 1. Select the Other > Yesterday from the time range picker:
sourcetype=access_* method=GET | stats count AS views
2. Use stats to count the number of purchases (characterized by the action field):
sourcetype=access_* method=GET | stats count AS Views, count(eval(action="purchase")) AS Purchases You also use the count() function again, this time with an eval() function, to count the number of purchase actions and rename the field as Purchases.
Here, the renaming is required--the syntax for using an eval() function with the stats command requires that you rename the field.
Now you just need to calculate the percentage, using the total views and the purchases. 3. Use the eval command and pipe the results to rename:
sourcetype=access_* method=GET | stats count AS Views, count(eval(action="purchase")) as Purchases | eval percentage=round(100-(Purchases/Views*100)) | rename percentage AS "%
75
enables you to evaluate an expression and save the result into a field. Here, you use the round() function to round the calculated percentage of Purchases to Views to the nearest integer.
Example 3
In the previous examples you searched for successful purchases, but you also want to know the count of purchase attempts that failed! 1. Run the search for failed purchase attempts, selecting Yesterday from the time range picker:
sourcetype=access_* action=purchase status=503
(You should recognize this search from the Start searching topic, earlier in this tutorial.) This search returns the events list, so let's count the number of results. 2. Use the stats command:
sourcetype=access_* action=purchase status=503 | stats count
76
This means that there were no failed purchases yesterday! 3. Save this search as Failed purchases (Yesterday). Now you should be comfortable using the search language and search commands. When you're ready, proceed to the next topic to learn how to create reports.
Create reports
This topic builds on the searches that you ran and saved in the previous search examples and walks you through creating charts and building reports.
Storm can dynamically update generated charts as it gathers search results. When you initiate a search, you can start building your report before the search completes. You can use the fields menu to quickly build simple pre-defined reports or use the Report Builder, which lets
you define, generate, and fine-tune the format of your report, from the type of chart you want to create to the contents you want to display on this chart. To learn more about using the report builder to define basic report parameters, format charts, and export or print finished reports, see "Define reports with the Report Builder" in the core Splunk product documentation. Note that any references in the core Splunk product documentation to the CLI and configuration files are not relevant for Splunk Storm. Back at the Flower & Gift shop, you're still building your reports. The previous searches you ran returned either a single value (for example, a count of failed errors) or a table of results (a table of products that were purchased). Now, you want to also add some pretty charts to your reports for yesterday's activities: The count of products purchased over time The count of purchases and views for each product category
77
Let's modify it a little. 1. Run this search over the time range, Yesterday:
sourcetype=access_* method=GET | chart count AS views, count(eval(action="purchase")) AS purchases by category_id | rename views AS "Views", purchases AS "Purchases", category_id AS "Category" Here, you use the chart command instead of the stats command. The chart command enables you to create charts and specify the x-axis with the by clause.
2. Click Create > Report. Because you use the chart command and have already defined your report, this opens the Format report page of the Report Builder.
78
If you see something different in this window, for example a different chart type, it's probably because you're not looking at the default settings. You don't need to worry about this though.
If your search string includes reporting commands, you access the Report Builder by clicking Show report. Storm will jump you directly to the formatting stage of the report-building process, since your reporting commands have already defined the report. The referenced documentation is in the core Splunk product documentation. Note that any references in the core Splunk product documentation to the CLI and configuration files are not relevant for Splunk Storm.
You don't need to have a strong understanding of reporting commands to use the Report Builder, but if you do have this knowledge the range of things you can do with the Report builder is increased. 3. Under Formatting options: Leave the chart type set to column. Name the chart, Purchases and Views by Product Type.
79
Because you're using the chart command, you have to define the axes of the chart. 4. Under General, leave the settings as it is.
5. Under Format, click X-axis: Type in "Product type" for the X-axis title.
6. Under Format, click Y-axis: Type in "Count of events" for the y-axis title.
80
7. Click Apply.
Now you should see your chart of purchases and views formatted as a column chart with the types of products on the X-axis.
7. Click Save and select Save report... The Save report dialog window opens:
81
Name your report Purchases & Views (Yesterday). Click Finish >>.
Click Build report in the Actions dropdown menu after you initiate a new search or run a saved search. Click a field in the search results sidebar to bring up the interactive menu for that field. Depending on the type of field you've clicked, you'll see links to reports in the interactive menu such as average over time, maximum value over time, and minimum value over time (if you've selected a numerical field) or top values over time and top values overall (if you've selected a non-numerical field). Click on one of these links, and Splunk opens the Format report page of the Report Builder, where it generates the chart described by the link.
82
1. To start, run the search for purchases and views over the time range Yesterday:
sourcetype=access_* method=GET | chart count AS views, count(eval(action="purchase")) AS purchases by category_id | rename views AS "Views", purchases AS "Purchases", category_id AS "Category"
83
Click Next.
Click Next.
Click Finish. When your dashboard appears, it has one panel in it.
6. To start defining panels for it, toggle the "Edit" button to On.
84
From here you can add a new panel to your dashboard, edit your dashboard's XML, or edit permissions to share your dashboard. You can also edit properties of your panel: edit the search, edit the visualization, or delete the panel. Share the dashboard Select Edit permissions to expand or restrict the permissions for the dashboard. The dashboard can be: A private view available only to yourself. Read-only for all your project's members. Congratulations! You've completed the Storm tutorial. Now you can get started Storming your own castle: Create a new project for your data. Add your own data.
85
Get started
Create and activate an account
This topic discusses how to create and activate your Splunk Storm account. When you use Splunk Storm, you log in with a splunk.com user account. This account is the owner of any projects you create in Storm.
Create a project
This topic discusses how to create a project in Splunk Storm.
86
Once you've activated your account in Splunk Storm, you can create your first project.
What's a project?
Splunk Storm allows you to create projects to organize your data. A project can contain just one input or several. One main reason to organize your data into projects is that you can choose to share a project with other users. If you've got some data you want to share and some you want to keep private, create two projects: one that's just for you, and one to share with other users.
From here, you can read "About adding data" to define the data inputs for your project. Or you might want to change your project's deletion policy from its default of storing data indefinitely. Read about deletion policies in "Choose a data storage plan."
88
If you send 2 GB of data per day to Storm, and set the data deletion policy to 30 days (the shortest allowed deletion policy), you will store 60 GB of data in Storm. Your corresponding storage level (the smallest one that accommodates your storage needs) would be 100 GB. Storm provides two graphs to help you determine how much data storage you'll need. Find these graphs under your project's Storage tab and read about them in "How much data am I sending to Storm?"
89
In this example, we've uploaded files on two different days. The second graph shows total data stored. This graph uses the time stamp on the data, which is also what your deletion policy uses. This graph shows how storage is being consumed. Again, Storm counts the raw, uncompressed data you send it. It also shows the average data stored per day, week, or month (in two places: as an overlay and in one of the three panels below the graph). Change the time range over which the average storage usage is shown by clicking day, week, or month at the top right.
90
In this example, the project has some recent data and some from July, and the average weekly data storage is 2.5 MB.
About billing
What happens when I modify my plan?
When you modify your storage plan, the changes go into effect right away. If your new plan costs more than your old plan, we send you an invoice and charge your credit card for the upgrade the next morning (Pacific time) during our daily bill runs. If your new plan is less expensive, we issue you an account credit. Your account credit should appear on an invoice the next morning (Pacific time). We will use all available account credit on your next payment(s) before charging your credit card. You might want to change your deletion policy to make more room once you downgrade your storage plan. Read about deletion policies in "Choose a data storage plan" in this manual.
91
If your credit card expires Storm will notify you when the expiration date is approaching for the credit card you have on file. Make sure that your email address is correct and that you can receive notification emails from Storm. You might want to add splunk-storm@splunk.com to any of your email whitelists. Keep your payment information current to prevent a disruption in service (as described below). If your credit card is declined If Storm is unable to charge your credit card, we email you a notification. Storm retries your card three times over the next several days until the payment is successful. Storm also retries your card when you update your payment information. After 10 days of unsuccessful payment attempts, Storm emails you that all the projects you own are now "delinquent" and Storm is no longer receiving any new data you send to your projects. On the 15th day of unsuccessful payment attempts (that is, five days after Storm stops receiving new data), Storm deletes all your delinquent projects. Any data in the deleted projects will be gone. Make sure that you can receive emails from splunk-storm@splunk.com so that you can be notified if there's a problem with your credit card.
92
Add data
About adding data
You have four basic ways to get data into Splunk Storm, which you can read about on these individual pages: Add data using the Storm REST API (Beta). This is one of two Storm data input methods that can use SSL. About forwarding data to Storm. Forwarding data is the other SSL-enabled data input method. This method involves installing a small instance of Splunk (a forwarder) on your server. Using a forwarder is a more robust solution than sending network data. Send network data over a TCP or UDP port. This includes sending syslog, syslog-ng, rsyslog, Snare, netcat, or Heroku data. Upload a file. This is a one-time file upload.
When you're ready to add data, be sure to read "About source types" in this manual.
93
Authorize an IP address manually You can specify an IP address to authorize if you know the IP of your data source and want to ensure that your Storm project receives data from only that IP. To add a remote data source manually: Navigate to your project's Inputs > Network data page. Click Authorize your IP address. Highlight Manually and click. Enter the IP address of your remote data source. If you're accessing Storm from that host, you can click What is my IP? and Storm will look it up for you. Specify a source type. If you don't specify a source type, Storm automatically assigns your data a source type of syslog. For more information about source type, refer to "Sources and source types" in this manual. Note that the "Data last received" column shows NA; this is a known issue.
95
Set up syslog
This topic talks about how to configure syslog so that you can get your syslog data into Splunk.
What is syslog?
Syslog (syslogd) is a standard for fowarding log messages for a system, often over an IP network. The term "syslog" refers to the syslog protocol, which sends a small message to a syslog daemon (syslogd) using either TCP or UDP in clear text. Syslog can be used for managing computer systems and security auditing for servers and applications. It is supported by a wide variety of devices and platforms.
Important note
Because plain syslog supports only the UDP transport protocol, and not the more reliable TCP protocol, we recommend that customers use rsyslog or syslog-ng instead of plain syslog. If syslogd is your only option (as is the case with some router or network devices), first ensure that your version of syslog supports sending data to a custom port number (other than UDP port 514). If it doesn't, you'll need to use another method for getting data to Splunk. If it does support custom port numbers, read on...
Be sure you use the correct hostname and port from the input you created! After you've saved the configuration file, you'll need to restart syslog. A simple cross-platform way to do this is by getting a process list, then sending a HUP signal to the process ID:
96
Then log into the Splunk search interface (by going to your project then clicking Explore data) and search for your events, starting with "my little pony".
Set up syslog-ng
This topic talks about how to set up syslog-ng so that you can get syslog-ng data into Splunk.
What is syslog-ng?
Syslog-ng is an open-source *nix implementation of the syslog logging standard. The original syslog protocol allows messages to be sorted based only on priority/facility pairs; syslog-ng adds the ability to filter based on message content using regular expressions. Most importantly, syslog-ng supports transport over TCP. Syslog-ng is available as a free download at Balabit's website and is included in many *nix distributions by default.
97
Next configure the protocol and port, and put them in a destination entry, being sure to specify TCP.
Be sure to replace the logsX.splunkstorm.com and port(20000) with the address and port that is shown under Data Sources on your Inputs page. Next tell syslog-ng to forward Splunk the s_all source to the d_splunk destination:
Next, specify which files you want syslog-ng to monitor: Let's say you want to monitor the error.log and web.log files in the /var/log/myapp/ directory. Specify this in the source directive:
source s_all { internal(); unix-stream("/dev/log"); file("/proc/kmsg" program_override("kernel: ")); file("/var/log/myapp/error.log" follow_freq(1) flags(no-parse)); file("/var/log/myapp/web.log" follow_freq(1) flags(no-parse)); }; destination d_splunk { tcp("logsX.splunkstorm.com" port(20000)); }; log {
98
source(s_all); destination(d_splunk); };
Log into the Splunk search interface (by going to your project then clicking Explore data). Search for your events, starting with "my little pony".
Set up rsyslog
This topic talks about how to set up rsyslog so that you can get rsyslog data into Splunk.
What is rsyslog?
Rsyslog is an open-source *nix implementation of the syslog protocol. It supports reliable syslog transport over TCP, local buffering, SSL/TLS/RELP, logging to databases, and email alerting.
$ModLoad imfile $InputFileName /var/log/nginx/error.log $InputFileTag nginx: $InputFileStateFile stat-nginx-error $InputFileSeverity error $InputRunFileMonitor
99
Be sure to replace the logsX.splunkstorm.com and port of 20000 with the address and port that is shown under your project's Inputs > Network data page. This configuration will make rsyslog send all of your logs to Splunk. If you do not like this behavior, add this first line:
& ~
If you want to send data over UDP instead of TCP (although we do recommend TCP), the last line of your rsyslog.conf edit should be:
*.* @logsX.splunkstorm.com:[PORT #]
The InputFileTag line tells rsyslog what to add as the tag in the log records. The InputFileStateFile is the file that will keep track of how much of that file you have already sent in. Make this unique for each file that you are using.
Then log into the Splunk search interface (by going to your project then clicking Explore data) and search for your events, starting with "my little pony".
What is Snare?
Snare for Windows is a service that interacts with the underlying Windows Event Log subsystem to facilitate remote, real-time transfer of event log information.
100
Snare for Windows is compatible with the following operating systems: Windows NT Windows XP Windows Server 2000 Windows Server 2003 If you want to capture Windows events, like those in your event logs, currently the Snare EventLog Agent is the easiest way to do this. You can download the free agent from Intersect Alliance. If you're running Windows Vista, Windows 8, or Windows Server 2008, be sure to download the "Snare for Windows Vista" package. (From: http://www.intersectalliance.com/projects/SnareWindows)
Then restart the Snare service in your Service Control manager to make sure the configuration is enabled.
101
You should now see windows eventlog events in Splunk. You can increase or decrease the logging levels by editing the "Objective Configuration" in Snare.
Note: Avoid tailing compressed files. This tailing example does not recurse to subdirectories. Be sure to replace "[youraddress]" and "[yourport]" with your own assigned values.
102
You can easily send data from a Heroku application via Heroku's syslog drain feature. For general information on Heroku's logging functionality visit their documentation. 1. From your Heroku application directory, turn logging to DEBUG level to increase the amount of data that is logged by Heroku:
heroku config:add LOG_LEVEL=DEBUG
2. Go to your Project in your splunkstorm.com account, and under "Inputs", click Network data. Click Authorize then Automatically. Take note of (or copy) the IP address and port assigned to your project. 3. From the Heroku application directory, create a syslog drain to your Splunk Storm project:
heroku drains:add syslog://logs2.splunkstorm.com:99999
Note: Be sure to replace logs2.splunkstorm.com:99999 with your assigned IP address and port. 4. Generate an action on your application that will generate logs. Be sure to do this during the 15 minute auto authentication period, or you will have to repeat step 2. Refresh the Inputs tab page on your Splunk Storm project and you should see Heroku's IP addresses added to your "Authenticated IP Addresses" list (there may be 5 IP addresses or more). 5. Click Explore Data from your project to see your data. You data will be indexed as the "syslog" source type, which is how Heroku formats its logs. Note: If your logs stop flowing or you believe not everything is reaching Storm, try turning Storm's auto authentication on again to ensure that all IP addresses are authenticated. Heroku might occasionally change the servers your application is hosted on.
103
You can re-authenticate the new IP(s) as described in step 2 above. In this situation, you risk losing some data from your application until you notice it is no longer coming in to your Storm project. You can configure your application to log to syslog on an intermediate server and then configure a Splunk universal forwarder as described in the forwarding chapter of this manual. This way, syslog will always receive your data, and the Splunk forwarder will handle getting it into Storm.
Upload a file
You can add data to your Storm deployment by uploading text files of up to 100 MB in size each. Binary, any compressed file format (such as zip, tar, or gz), executable, audio/video, and image files are not supported. To upload a file to Storm: 1. From your project, click Inputs. 2. Click File upload. 3. Click +Upload. The Upload a file panel is displayed. 4. Browse to your file and choose it. 5. Give your file a source type. Specifying a source type tells Splunk how to parse your data, and allows you to group all the data of a certain type together when searching. You can choose a predefined source type from the list, or select custom source type. If you choose to specify a custom source type, give it a name. If you don't specify a source type, the data will be assigned a type of "syslog". Read about Storm source types in this manual. If you choose to specify a custom source type, Storm will linebreak multi-line events, find the timestamps in your data, and extract some default fields as described in the following topics from the core Splunk product documentation: How Splunk extracts timestamps Overview of default fields 6. Choose the time zone for the data. If this file was generated in a different time zone from your project's default time zone, specify it here.
104
7. Click Upload.
105
106
107
at a command prompt. You can also double click on the forwarder MSI file in the Explorer window. This installation directory will from now on be denoted %SPLUNK_HOME%. Note: If you are using the MSI installer wizard, do not specify a destination server. If you did, clean the output.conf from
%SPLUNK_HOME%\etc\system\local\outputs.conf
108
7. Install the forwarder credentials. From the directory containing the credentials package (using the default forwarder password), type:
%SPLUNK_HOME%\bin\splunk install app stormforwarder_<project_id>.spl -auth admin:changeme.
This command copies the file to %SPLUNK_HOME%\etc\apps and uncompresses it. 8. Log into the forwarder using the default credentials, admin/changeme:
%SPLUNK_HOME%\bin\splunk login -auth admin:changeme
10. Add files or a directory for the forwarder to monitor, using either the CLI or configuration files. 11. Restart the forwarder so the changes take effect:
%SPLUNK_HOME%\bin\splunk restart
109
Change the value of the key _storm_project_id to the ID of the project you would like to send to. Find the project ID by logging into your Storm account, navigating to the project you want to send data to, and then to Inputs > API. Note: Your access token must have Admin permission for the specified project. See "Share your project" for information about project permission. Change the default time zone of the data Edit the file
%SPLUNK_HOME%\etc\apps\stormforwarder_<project_id>\local\inputs.conf.
Change the value of the key _tzhint to the ID of the project you would like to send to (and uncomment it). Find the project ID by logging into your Storm account, navigating to the project you want to send data to, and then to Inputs > API. Note: By default, this time zone is set to value of your project's time zone at the time the credentials package was downloaded.
110
Remove the default network throughput limit Forwarders have a default limit of 256KBps. If you plan to forward a large volume of data, you can increase the limit or remove it. To do this: 1. Create or edit the file limits.conf in an app, example %SPLUNK_HOME%/etc/apps/search/local/limits.conf. 2. Copy the new settings in it :
[thruput] maxKBps = 0 # zero means unlimited # default was 256
2. Download forwarder credentials for this project by clicking credentials package. This package contains the authentication credentials and configuration that allow sending data to this project only. Do not skip this step. Note: Do not share this information with anyone else, as it contains your access token. 3. Follow the link to the universal forwarder downloads page and download the package of your choice. Install the forwarder 4. Copy both files to the server that will send data to Storm. 5. Install the universal forwarder package by either running the installer or decompressing the file. For example, if installing the universal forwarder onto Red Hat Linux, use this command to install the package by default in /opt/splunkforwarder
rpm -i splunkforwarder_<package_name>.rpm
This installation directory will from now on be denoted $SPLUNK_HOME. 6. Start the universal forwarder.
$SPLUNK_HOME/bin/splunk start
This command copies the file to $SPLUNK_HOME/etc/apps and uncompresses it. Configure the forwarder inputs 8. Log into the forwarder using the default credentials "admin/changeme".
$SPLUNK_HOME/bin/splunk login -auth admin:changeme
112
10. Add files or a directory for the forwarder to monitor, using either the CLI or configuration files as discussed below. 11. Restart the forwarder so the changes take effect:
$SPLUNK_HOME/bin/splunk restart
Within seconds of running this command an entry will appear on Storm's Inputs > Forwarders page. 2. Run the following command to see a list of all monitored files. The files or directories that you've added should appear in the results of this command.
$SPLUNK_HOME/bin/splunk list monitor
For information on the CLI commands, read "CLI commands for input". Examples: Add data using the CLI Example 1: Monitor sends data to Storm as it is added to the file (real time).
$SPLUNK_HOME/bin/splunk add monitor /var/log -sourcetype syslog
113
Change the value of the key _storm_project_id to the ID of the project you would like to send to. Find the project ID by logging into your Storm account, navigating to the project you want to send data to, and then to Inputs > API. Note: Your access token must have Admin permission for the specified project. See "Share your project" for information about project permission. Change the default time zone of the data Edit the file
$SPLUNK_HOME/etc/apps/stormforwarder_<project_id>/local/inputs.conf.
Change the value of the key _tzhint to the ID of the project you would like to send to (and uncomment it). Note: By default, this time zone is set to value of your project's time zone at the time the credentials package was downloaded. Remove the default network throughput limit Forwarders have a default limit of 256KBps. To determine if you are hitting this limit, look for events in splunkd.log like
8-21-2012 10:10:40.563 -0400 INFO BatchReader - Could not send data to output queue (parsingQueue), retrying..
If you plan to forward a large volume of data, you can increase the limit or remove it. To do this:
114
1. Create or edit the file limits.conf in an app's directory structure, for example in $SPLUNK_HOME$/etc/apps/search/local/limits.conf. 2. Copy the new settings into the limits.conf:
[thruput] maxKBps = 0 # zero means unlimited # default was 256
Edit inputs.conf
This topic tells you about editing inputs.conf on a universal forwarder sending data to a Splunk Storm project. For core Splunk's forwarder documentation, see the Distributed Deployment Manual in core Splunk documentation. To add an input, add a stanza to inputs.conf in $SPLUNK_HOME/etc/apps/. If you have not worked with Splunk's configuration files before, read "About configuration files" in the core Splunk documentation before you begin. You can set multiple attributes in an input stanza. If you do not specify a value for an attribute, the forwarder uses the default that is predefined in $SPLUNK_HOME/etc/system/default/. Note: To ensure that new events are indexed when you copy over an existing file with new contents, set CHECK_METHOD = modtime in props.conf for the source. This checks the modtime of the file and re-indexes it when it changes. Be aware that the entire file will be re-indexed, which can result in duplicate events.
Configuration settings
The following are attributes that you can use in the monitor input stanzas. See the sections that follow for attributes that are specific to each type of input.
host = <string>
Sets the host key/field to a static value for this stanza. Sets the host key's initial value. The key is used during parsing/indexing, in particular to set the host field. It is also the host field used at search time.
115
The <string> is prepended with "host::". If not set explicitly, this defaults to the IP address or fully qualified domain name of the host where the data originated.
sourcetype = <string>
Sets the sourcetype key/field for events from this input. Explicitly declares the source type for this data, as opposed to allowing it to be determined automatically. This is important both for searchability and for applying the relevant formatting for this type of data during parsing and indexing. Sets the sourcetype key's initial value. The key is used during parsing/indexing, in particular to set the source type field during indexing. It is also the source type field used at search time. The <string> is prepended with "sourcetype::". If not set explicitly, Splunk picks a source type based on various aspects of the data. There is no hard-coded default. For more information about source types, see "source type". Specifying a time zone Forwarding data using a universal forwarder allows you to specify a time zone value that will be applied to all data coming from that forwarder. To specify a time zone setting for this forward, add a value (such as "US/Pacific") for the _tzhint attribute under the [default] stanza. This setting will be applied to all the inputs defined on this forwarder and will override any setting defined for the project to which the data is sent. For more information about setting a time zone for other types of data inputs, refer to "How does the time zone affect things?" in this manual.
116
...
The following are additional attributes you can use when defining monitor input stanzas:
source = <string>
Sets the source key/field for events from this input. Note: Overriding the source key is generally not recommended. Typically, the input layer will provide a more accurate string to aid in problem analysis and investigation, accurately recording the file from which the data was retreived. Consider use of source types, tagging, and search wildcards before overriding this value. The <string> is prepended with "source::". Defaults to the input file path.
crcSalt = <string>
Use this setting to force Splunk to consume files that have matching CRCs (cyclic redundancy checks). (Splunk only performs CRC checks against the first few lines of a file. This behavior prevents Splunk from indexing the same file twice, even though you may have renamed it -- as, for example, with rolling log files. However, because the CRC is based on only the first few lines of the file, it is possible for legitimately different files to have matching CRCs, particularly if they have identical headers.) If set, string is added to the CRC. If set to <SOURCE>, the full source path is added to the CRC. This ensures that each file being monitored has a unique CRC. Be cautious about using this attribute with rolling log files; it could lead to the log file being re-indexed after it has rolled. Note: This setting is case sensitive.
ignoreOlderThan = <time window>
Causes the monitored input to stop checking files for updates if their modtime has passed the <time window> threshold. This improves the speed of file tracking operations when monitoring directory hierarchies with large numbers of historical files (for example, when active log files are co-located with old files that are no longer being written to). Note: A file whose modtime falls outside <time window> when monitored for the first time will not get indexed. Value must be: <number><unit>. For example, "7d" indicates one week. Valid units are "d" (days), "m" (minutes), and "s" (seconds). Defaults to 0 (disabled).
117
followTail = 0|1
If set to 1, monitoring begins at the end of the file (like tail -f). This only applies to files the first time they are picked up. After that, Splunk's internal file position records keep track of the file. Defaults to 0.
whitelist = <regular expression>
If set, files from this path are monitored only if they match the specified regex.
blacklist = <regular expression>
If set, files from this path are NOT monitored if they match the specified regex.
alwaysOpenFile = 0 | 1
If set to 1, Splunk opens a file to check if it has already been indexed. Only useful for files that don't update modtime. Should only be used for monitoring files on Windows, and mostly for IIS logs. Note: This flag should only be used as a last resort, as it increases load and slows down indexing.
time_before_close = <integer>
Modtime delta required before Splunk can close a file on EOF. Tells the system not to close files that have been updated in past <integer> seconds. Defaults to 3.
recursive = true|false
If set to false, Splunk will not go into subdirectories found within a monitored directory. Defaults to true.
followSymlink
If false, Splunk will ignore symbolic links found within a monitored directory. Defaults to true.
118
[monitor:///apache/.../logs]
[monitor:///apache/*.log]
Action
add monitor [-parameter value] ... Monitor inputs from <source>. edit monitor [-parameter value] ... <source>. remove monitor
remove monitor <source> edit monitor <source> Edit a previously added monitor input for Remove a previously added monitor input for <source>. List the currently configured monitor inputs. Copy the file <source> directly into Splunk. This uploads the file once, but Splunk does not continue to monitor it.
Change the configuration of each data input type by setting additional parameters. Parameters are set via the syntax: -parameter value. Note: You can only set one -hostname, -hostregex or -hostsegmentnum per command.
119
Parameter Required?
Description
Path to the file or directory to monitor/upload for new input.
<source>
Yes
Note: Unlike the other parameters, the syntax for this parameter is just the value itself and is not preceded by a parameter flag: "<source>", not "-source <source>".
Specify a sourcetype field value for events from the input source. Specify a host name to set as the host field value for events from the input source.
sourcetype
No
hostname host
or
No
120
You can also upload a file via the sinkhole directory with the spool command:
121
Authentication Storm uses an API token authentication scheme over HTTP Basic Authentication with TLS data encryption (HTTPS). For each user, Storm creates an access token that you use as the "password" for access. Any value for username can be used with this token. Storm ignores the username field. You can use the same token for all Storm projects for which you are an administrator.
122
Splunk default fields These are parameters to the endpoint that must be embedded as query parameters in a URL. Required The following parameters are required. Storm responds with an HTTP 400 error if either of the required fields is missing. index: Specifies the project ID. sourcetype: Identifies the incoming data. See About source types for information on using source types with Splunk Storm. Optional You can optionally specify the following parameters. The data input API allows you to override the values normally associated with host and source if you need to use these for data classification. tz: Time zone to use when indexing events. host: The host emitting the data (defaults to the reverse DNS for the source IP). source: The data source (defaults to the source IP of the data). Request body The raw event text to input. You can send the raw event text as plain text (text/plain) or as URL-encoded text (application/x-www-form-urlencoded). In either case, Storm correctly interprets the raw input event text.
as the complete body of the request. You can keep the connection open to send additional logging events. This trivial use case is typically inefficient. You often send and receive more data in the headers of the request than in the body. Send multiple events over a single call In this use case, you buffer multiple events that you send in a single call. Again, consider a source type that handles log data. You may want to buffer many events locally until you reach a threshold, such as the size of the data, a time period to send data, a specific log level (for example, ERROR logs), or some other factor. If you have collections of events that need different Splunk default fields, such as source, send them in different HTTP requests. You specify these fields in the URL query string, which necessitates the separate requests. Once you reach the threshold, send the buffered events as the body of a single call. You can also pipeline requests. Keep the TCP connection open, and send multiple HTTP requests. The advantage of this use case is that you limit the SSL/TLS and HTTP overhead in sending and receiving data. However you have to consider factors such as: Local memory available for buffering data Risk of losing data should a server go down Timeliness of making the new events available in your project Send compressed data You may send data more efficiently by uploading a gzip compressed stream. You must supply a "Content-Encoding: gzip" header if sending gzipped data for it to be correctly decompressed on receipt. Example:
echo 'Sun Apr 11 15:35:15 UTC 2011 action=download_packages status=OK pkg_dl=751 elapsed=37.543' \ | gzip \ | curl -u x:$ACCESS_TOKEN \ "https://api.splunkstorm.com/1/inputs/http?index=$PROJECT_ID&sourcetype=generic_sing \
124
export ACCESS_TOKEN=<access_token> export PROJECT_ID=<project_id> export SOURCETYPE=generic_single_line export EVENT="Sun Apr 11 15:35:15 UTC 2011 action=download_packages status=OK pkg_dl=751 elapsed=37.543"
3. Use curl to run the HTTP request using the above exported environment variables:
Response: {"bytes":99,"index":"fdf492f01ee111e28f18123139332c16","host":"example.com","source":"8.
Sending a file You can also use cURL to send an entire file instead of a single event. However, to monitor a file (or multiple files) and send updates to Storm as they happen you should instead use a Splunk Forwarder for reliable transmission.
125
To send a single one-off file: 1. As with sending a single event in the previous example, export your access token and project id into your *nix terminal with the source type and name of the file you want to send:
2. Use curl to run the HTTP request using the above exported environment variables:
Depending on the size of the file, curl may take several minutes to transfer the data to Storm. Note: Although you can transfer large files this way, for files of more than a few hundred megabytes it is recommended to use a Splunk forwarder instead. A forwarder automatically resumes in the case of a transmission error midway through the upload. For more details, see the documentation about forwarding data to Storm.
More examples
For examples using Python and Ruby, read "Examples: input data with Python or Ruby." We'll be adding more (and longer) code examples in our GitHub repository.
126
Storm supports HTTP/1.1 connections in Keep-Alive mode. If your client sets the Connection: Keep-Alive HTTP request header, Storm keeps the TCP connection open after the first request, waiting for additional data. This allows you to send data with different metadata on the same TCP connection. Caution: Make sure that you send some complete events in each request. This applies whether you make only one HTTP request in the TCP connection or many. POST /1/inputs/http/ Stream events from the contents contained in the HTTP body to a project. When sending data, indicate the source type to apply to the events. See About source types for information on using source types with Splunk Storm.
Request
The body of the request contains the raw event text you are streaming to your Storm project. Name Type Required Description
Raw event text. This is the entirety of the HTTP request body.
<request_body> String
Specify parameters to the request as query parameters in the URL. Name index Type Required
String
Default
<Project
Description
ID> The Project ID to which events from of your first this input are sent. project.
The source type to apply to events from this input. See About source
sourcetype String
DNS PTR name for the source IP address IP address of the sending host
host
String
source
String
127
tz
String
The time zone to apply to events from this input. Storm recognizes zoneinfo TZ IDs. Refer to the zoneinfo (TZ) database for all permissible tz values.
Response status
Description
Request error. See response body for details. Not authorized to write to the project, or project does not exist.
For a successful response, the fields are: Field name bytes index host source sourcetype tz Type
Number The project ID. The host field the data was indexed with. The source field the data was indexed with. The source type field the data was indexed with. The time zone field the data was indexed with.
Description
Number of bytes received, if any.
For an error response, the fields are: Field name status type text
Example String String
Type
Number
Description
HTTP status code (for example, 200). The string "ERROR." The reason for the error.
Send a single event to the project f75b3a9abc with a sourcetype of syslog and the host set to my.example.com. Specify URL-encoding for the POST data. This example assumes that ACCESS_TOKEN is an environment variable specifying the Storm access token. x represents the required username string, which Storm ignores.
128
Request
curl -k -u x:$ACCESS_TOKEN \ "https://api.splunkstorm.com/1/inputs/http?index=f75b3a9abc&sourcetype=syslog&host=my. \ --data-urlencode "Sun Apr 11 15:35:15 UTC 2011 action=download_packages status=OK pkg_dl=751 elapsed=37.543"
Response
import urllib import urllib2 class StormLog(object): """A simple example class to send logs to a Splunk Storm project. Your ``access_token`` and ``project_id`` are available from the Storm UI. """ def __init__(self, access_token, project_id, input_url=None): self.url = input_url or 'https://api.splunkstorm.com/1/inputs/http'
129
self.project_id = project_id self.access_token = access_token self.pass_manager = urllib2.HTTPPasswordMgrWithDefaultRealm() self.pass_manager.add_password(None, self.url, 'x', access_token) self.auth_handler = urllib2.HTTPBasicAuthHandler(self.pass_manager) self.opener = urllib2.build_opener(self.auth_handler) urllib2.install_opener(self.opener) def send(self, event_text, sourcetype='syslog', host=None, source=None): params = {'project': self.project_id, 'sourcetype': sourcetype} if host: params['host'] = host if source: params['source'] = source url = '%s?%s' % (self.url, urllib.urlencode(params)) try: req = urllib2.Request(url, event_text) response = urllib2.urlopen(req) return response.read() except (IOError, OSError), ex: # An error occured during URL opening or reading raise
# Example # Set up the example logger # Arguments are your access token and the project ID log = StormLog('abcdefghi...', '198ahb3280...') # Send a log; will pick up the default value for ``source``. log.send('Apr 1 2012 18:47:23 UTC host57 action=supply_win amount=5710.3', sourcetype='syslog', host='host57') # Will pick up the 'default' value for ``host``. log.send('Apr 1 2012 18:47:26 UTC host44 action=deliver from=foo@bar.com to=narwin@splunkstorm.com', sourcetype='syslog')
130
> API*. # 3. Open this script in a text editor. # 4. In the _User Options_ section, set *PROJECT_ID* and *ACCESS_TOKEN*. # 5. Also in _User Options_, set +event_params+'s *sourcetype* and *host*. # 6. Pipe your data into this script. For example: # $ ruby storm-rest-ruby.rb < system.log # # = Definitions # [PROJECT_ID] Splunk Storm Project ID. # [ACCESS_TOKEN] Splunk Storm REST API Access Token. # [sourcetype] Format of the event data, e.g. syslog, log4j. # [host] Hostname, IP or FQDN from which the event data originated. # # = Requires # * {rest-client gem}[https://github.com/archiloque/rest-client] require 'rest-client'
# User Options PROJECT_ID = 'xxx' ACCESS_TOKEN = 'yyy' event_params = {:sourcetype => 'syslog', :host => 'gba.example.com'}
# Nothing to change below API_HOST = 'api.splunkstorm.com' API_VERSION = 1 API_ENDPOINT = 'inputs/http' URL_SCHEME = 'https' # Actual code event_params[:project] = PROJECT_ID api_url = "#{URL_SCHEME}://#{API_HOST}" api_params = URI.escape(event_params.collect{|k,v| "#{k}=#{v}"}.join('&')) endpoint_path = "#{API_VERSION}/#{API_ENDPOINT}?#{api_params}" request = RestClient::Resource.new( api_url, :user =>'x', :password => ACCESS_TOKEN) response = request[endpoint_path].post(ARGF.read) puts response
131
132
Answers
Have questions about search commands? Check out Splunk Answers to see what questions and answers other Splunk users had about the search language. Now, on to the cheat sheet!
fields
add
Save the running total of "count" in a field called "total_count". Add information about the search to each event. Search for "404" events and append the fields in each event to the previous search results. For each event where "count" exists, compute the difference between count and its previous value and ... | accum count AS total_count ... | addinfo ... | appendcols [search 404] ... | delta count AS countdiff
133
store the result in "countdiff". Set velocity to distance / time. Extract field/value pairs and reload field extraction settings from disk. Extract field/value pairs that are delimited by "|;", and values of fields that are delimited by "=:". Add location information (based on IP address). Extract values from "eventtype.form" if the file exists. ... | eval velocity=distance/time ... | extract reload=true ... | extract pairdelim="|;", kvdelim="=:", auto=f ... | iplocation ... | kvform field=eventtype
Extract the "COMMAND" field when it occurs in rows that ... | multikv fields contain "splunkd". COMMAND filter splunkd
to "green" if the date_second is between 1-30; "blue", if between 31-39; "red", if between 40-59; and "gray", if no range matches (for example, if date_second=0).
Set range
Calculate the relevancy of the search and sort the results disk error | relevancy | in descending order. sort -relevancy Extract "from" and "to" fields using regular expressions. If ... | rex field=_raw "From: a raw event contains "From: Susan To: Bob", then (?<from>.*) To: (?<to>.*)" from=Susan and to=Bob. Extract the "author" field from XML or JSON formatted data about books. Add the field: "comboIP". Values of "comboIP" = ""sourceIP" + "/" + "destIP"". ... | spath output=author path=book{@author} ... | strcat sourceIP "/" destIP comboIP
Extract field/value pairs from XML formatted data. ... | xmlkv "xmlkv" automatically extracts values between XML tags. Extract the name
convert
Convert every field value to a number value except for values in the field "foo" (use the "none" argument to specify fields to ignore). Change all memory values in the "virt" field to Kilobytes. Change the sendmail syslog duration format (D+HH:MM:SS) to seconds. For example, if "delay="00:10:15"", the resulting value will be "delay="615"". ... | convert auto(*) none(foo) ... | convert memk(virt) ... | convert dur2sec(delay)
Convert values of the "duration" field into number value ... | convert by removing string values in the field value. For example, rmunit(duration)
134
if "duration="212 sec"", the resulting value will be "duration="212"". Separate the value of "foo" into multiple values. For sendmail events, combine the values of the senders field into a single value; then, display the top 10 values. ... | makemv delim=":" allowempty=t foo eventtype="sendmail" | nomv senders | top senders
filter
Keep the "host" and "ip" fields, and display them in the order: "host", "ip". Remove the "host" and "ip" fields. ... | fields + host, ip ... | fields - host, ip
modify
sourcetype="web" | Build a time series chart of web events by host and fill all timechart count by host | empty fields with NULL. fillnull value=NULL Rename the "_ip" field as "IPAddress". Change any host value that ends with "localhost" to "localhost". ... | rename _ip as IPAddress ... | replace *localhost with localhost in host
read
There is a lookup table specified in a stanza name 'usertogroup' in transform.conf. This lookup table contains (at least) two fields, 'user' and 'group'. For each ... | lookup usertogroup event, we look up the value of the field 'local_user' in the user as local_user OUTPUT table and for any entries that matches, the value of the group as user_group 'group' field in the lookup table will be written to the field 'user_group' in the event.
formatting
Show a summary of up to 5 lines for each search result. Map a single numerical value against a range of colors that may have particular business meaning or business logic. Highlight the terms "login" and "logout". ... | abstract maxlines=5 ... | stats count as myCount | gauge myCount 5000 8000 12000 15000 ... | highlight login,logout
Output the "_raw" field of your current search into "_xml". ... | outputtext
135
reporting
Calculate the sums of the numeric fields of each result, and put the sums in the field "sum". Analyze the numerical fields to predict the value of "is_activated". Return events with uncommon values. Return results associated with each other (that have at least 3 references to each other). For each event, copy the 2nd, 3rd, 4th, and 5th previous values of the 'count' field into the respective fields 'count_p2', 'count_p3', 'count_p4', and 'count_p5'. Bucket search results into 10 bins, and return the count of raw events for each bucket. Return the average "thruput" of each "host" for each 5 minute time span. Return the average (mean) "size" for each distinct "host". Return the the maximum "delay" by "size", where "size" is broken down into a maximum of 10 equal sized buckets. ... | addtotals fieldname=sum ... | af classfield=is_activated ... | anomalousvalue action=filter pthresh=0.02 ... | associate supcnt=3 ... | autoregress count p=2-5 ... | bucket size bins=10 | stats count(_raw) by size ... | bucket _time span=5m | stats avg(thruput) by _time host ... | chart avg(size) by host ... | chart max(delay) by size bins=10
... | chart Return the ratio of the average (mean) "size" to the eval(avg(size)/max(delay)) maximum "delay" for each distinct "host" and "user" pair. by host user Return max(delay) for each value of foo split by the value ... | chart max(delay) over of bar. foo by bar Return max(delay) for each value of foo. ... | chart max(delay) over foo ... | contingency datafield1 datafield2 maxrows=5 maxcols=5 usetotal=F ... | correlate type=cocur | eventcount ... | eventstats avg(duration) as avgdur ... | makecontinuous _time span=10m
Calculate the co-occurrence correlation between all fields. Return the number of events in the project. Compute the overall average duration and add 'avgdur' as a new field to each event where the 'duration' field exists Make "_time" continuous with a span of 10 minutes.
136
Remove all outlying numerical values. Return the least common values of the "url" field. Remove duplicates of results with the same "host" value and return the total count of the remaining results. Return the average for each hour, of any unique field that ends with the string "lay" (for example, delay, xdelay, relay, etc).
... | outlier ... | rare url ... | stats dc(host) ... | stats avg(*lay) BY date_hour
For each event, add a count field that represent the number of event seen so far (including that event). i.e., 1 ... | streamstats count for the first event, 2 for the second, 3, 4 ... and so on Graph the average "thruput" of hosts over time. Create a timechart of average "cpu_seconds" by "host", and remove data (outlying values) that may distort the timechart's axis. Calculate the average value of "CPU" each minute for each "host". ... | timechart span=5m avg(thruput) by host ... | timechart avg(cpu_seconds) by host | outlier action=tf ... | timechart span=1m avg(CPU) by host
Create a timechart of the count of from "web" sources by ... | timechart count by "host" host ... | timechart span=1m Compute the product of the average "CPU" and average eval(avg(CPU) * avg(MEM)) "MEM" each minute for each "host" by host Reformat the search results. Return the 20 most common values of the "url" field. Search the access logs, and return the number of hits from the top 100 values of "referer_domain". ... | timechart avg(delay) by host | untable _time host avg_delay ... | top limit=20 url sourcetype=access_combined | top limit=100 referer_domain | stats sum(count) ... | xyseries delay host_type host
results
append
Append the current results with the tabular results of "fubar". Joins previous result set with results from "search foo", on the id field. ... | chart count by bar | append [search fubar | chart count by baz] ... | join id [search foo]
137
filter
Return only anomalous events. Remove duplicates of results with the same host value. ... | anomalies
Combine the values of "foo" ... | mvcombine delim=":" foo with ":" delimiter. Keep only search results whose "_raw" field contains ... | regex IP addresses in the _raw="(?<!\d)10.\d{1,3}\.\d{1,3}\.\d{1,3}(?!\d)" non-routable class A (10.0.0.0/8). Join results with itself on 'id' ... | selfjoin id field. Return "physicsobjs" events sourcetype=physicsobjs | where distance/time > with a speed is greater than 100 100.
generate
All daily time ranges from Oct 25 till today Loads the events that were generated by the search job with id=1233886270.2 Create new events for each value of multi-value field, "foo". Run the "mysecurityquery" saved search. | gentimes start=10/25/12 | loadjob 1233886270.2 events=t ... | mvexpand foo | savedsearch mysecurityquery
group
Cluster events together, sort them by their "cluster_count" values, and then return the 20 largest clusters (in data size). ... | cluster t=0.9 showcount=true | sort cluster_count | head 20
Group search results into 4 clusters based on the values ... | kmeans k=4 date_hour of the "date_hour" and "date_minute" fields. date_minute Group search results that have the same "host" and "cookie", occur within 30 seconds of each other, and do not have a pause greater than 5 seconds between each event into a transaction. Have Splunk automatically discover and apply event types to search results Force Splunk to apply event types that you have configured (Splunk Web automatically does this when ... | transaction host cookie maxspan=30s maxpause=5s ... | typelearner ... | typer
138
order
Return the first 20 results. Reverse the order of a result set. Sort results by "ip" value in ascending order and then by "url" value in descending order. Return the last 20 results (in reverse order). ... | head 20 ... | reverse ... | sort ip, -url ... | tail 20
search
search
Keep only search results that have the specified "src" or "dst" values. src="10.9.165.*" OR dst="10.9.165.8"
subsearch
Get top 2 results and create a search from their host, source, and source type, resulting in a single search ... | head 2 | fields result with a _query field: _query=( ( "host::mylaptop" source, sourcetype, host | AND "source::syslog.log" AND "sourcetype::syslog" ) OR format ( "host::bobslaptop" AND "source::bob-syslog.log" AND "sourcetype::syslog" ) ) ... | localize maxpause=5m | map search="search failure starttimeu=$starttime$ endtimeu=$endtime$" | set diff [search 404 | fields url] [search 303 | fields url]
Return values of "URL" that contain the string "404" or "303" but not both.
Print a PDF
You can: Generate PDFs of dashboards, views, searches, or reports with a click of a button. COMING SOON: Arrange to have PDFs of searches, reports, and dashboards sent to a set of recipients that you define, on a regular schedule. COMING SOON: Arrange to have PDFs of searches and reports sent to a set of recipients that you define when specific alert conditions are met.
139
There are exceptions involving forms, dashboards that are built with advanced XML, and simple XML dashboards that have panels rendered in Flash rather than JavaScript. See the "Exceptions" section, below, for more information.
Forms won't be printed Integrated PDF generation cannot print forms at this point, whether they have been constructed with simple or advanced XML. Flash-only chart customizations will be ignored As we detail in the "About JSChart" topic of the Data Visualizations manual in core Splunk product documentation, Splunk uses the JSChart charting library to render dashboard panels for viewing through your browser except in cases where chart customizations that aren't supported by JSChart are implemented in the underlying simple XML. Panels that have JSChart-unsupported customizations are instead rendered for browser display by the Flash charting library. Integrated PDF generation relies solely on the JSChart charting library to render images of dashboard panels in PDFs. Because of this, when Splunk uses integrated PDF generation to generate a PDF of a dashboard it renders all dashboard panels with JSChart. When dashboard panels include simple XML chart customizations that are unsupported by JSChart, Splunk ignores those customizations. This means that panels with JSChart-unsupported customizations may appear different in PDF format than they do in your browser. In the browser they are rendered with Flash and have the customizations, while in the PDF they are rendered with JSChart and do not include the customizations. To see which charting library parameters are supported by JSChart, go to the custom charting configuration reference in the Data Visualizations manual in core Splunk product documentation. Each topic in the reference contains tables of chart customization parameters, and each table has a Supported by JSChart? column. Note: Over time we will be increasing the number of chart customization parameters that are compatible with JSChart, which means that you'll be able to make a wider range of customizations to your dashboard panels and still have them print correctly through integrated PDF generation.
141
Project members
Invite users to your project, or view and manage current members of a project, by clicking the Members tab for that project.
142
To invite a user, click +Add member. Select a project role (User or Admin) for your invited member. Click Send invite. To change a member's role, find the member's name in the list and click Edit. Note: You can invite up to 5 users to join a free project. You can invite up to 10 users to join a paid project.
Project roles
Storm users can belong to one of three roles: Owner, Admin, or User. Users can: view the project data run searches on the project data Admins can do everything a user can, plus: add data to that project invite other users to view the project delete data from the project Owners can do everything an admin can, plus: create a project delete the project update and change billing information request that a project be transferred to another project member Project owner The user who creates a given project is given the role of owner for that project. A project can have only one owner (whereas a project can have multiple admins and users). The project owner is responsible for project payment. Project owner gets billed for the project The user who is listed as the project owner receives all bills for that project. Read about Storm billing in this manual.
143
About billing
If the project is a paid project, the user to whom you transfer the project must have a credit card on file in Splunk Storm before the project can be transferred. If the project is a paid project, any credit remaining in the Storm account balance of the original project owner will be transferred with the project to the new project owner. The new project owner's credit card will be charged immediately for any outstanding balance due for the current month.
144
For API inputs Update the API access token in your code. You can find your API access token at <Project_name> > Inputs > API. For Splunk forwarders Update the credentials package (the stormforwarder_xxxx app) using the CLI. Download the correct package, stormforwarder_xxxxx.spl, then use
./splunk remove app stormforwarder_xxxxx ./splunk install app /path/to/my/new/creds/stormforwarder_xxxxx.spl ./splunk restart
Because the old and new credentials packages have the same folder and file name, it's not easy to distinguish the member in the credentials package. One method is to rename the package stormforwarder_xxxxxxx.spl to .tar.gz, untar it, and check the content of the inputs.conf (the API token is in there, along with the project ID). You can compare the forwarder's inputs.conf to the default file in $SPLUNK_HOME/etc/apps/stormforwarder_xxxxx/default/inputs.conf
Troubleshoot a project
This topic suggests various troubleshooting options for you to use if you're having difficulty with Storm.
145
146
On the Delete data page, set the number of days you wish to retain data. Confirm by clicking Delete. It sometimes takes a while to delete data. Note: You cannot delete data that is less than 30 days old.
To keep your project, you can do any of the following: log into splunkstorm.com, click "explore data" in your project, and run a
147
search; send new data to your project; or upgrade your project to a paid plan. You can also delete your inactive project yourself. Note that we won't delete any paid projects. And we won't delete your Splunk Storm user account, even if you have no projects at all.
148
Select Alert... to open the Create alert dialog on the Schedule step. Give the alert a Name and then select the alert Schedule.
149
150