0 evaluări0% au considerat acest document util (0 voturi)
11 vizualizări7 pagini
Present days security is the key for everything it
is considering as major issue in each and every aspect .There
are lots of technologies developed to provide a se currently
one of them is by using “signature-based” detection, can’t
bare stressed traffic, where marketing value is increasing.
This paper developed on the theme of compressed HTTP
traffic. HTTP uses GZIP compression. Decompression phase
is used by HTTP before performing a string matching. In this
a algorithm, Aho–Corasick-based algorithm for Compressed
HTTP (ACCH) was used to provide more advantage than the
commonly used Aho–Corasick pattern-matching algorithm.
The advantage of this is that takes advantage of information
gathered by the decompression phase in order to accelerate.
we explore that it is faster to perform pattern matching on the
compressed data, with the defect of decompression than on
regular traffic. We are the initial one that analyzes the
problem of “on-the-fly” multipattern matching on compressed
HTTP traffic and solves it.
Present days security is the key for everything it
is considering as major issue in each and every aspect .There
are lots of technologies developed to provide a se currently
one of them is by using “signature-based” detection, can’t
bare stressed traffic, where marketing value is increasing.
This paper developed on the theme of compressed HTTP
traffic. HTTP uses GZIP compression. Decompression phase
is used by HTTP before performing a string matching. In this
a algorithm, Aho–Corasick-based algorithm for Compressed
HTTP (ACCH) was used to provide more advantage than the
commonly used Aho–Corasick pattern-matching algorithm.
The advantage of this is that takes advantage of information
gathered by the decompression phase in order to accelerate.
we explore that it is faster to perform pattern matching on the
compressed data, with the defect of decompression than on
regular traffic. We are the initial one that analyzes the
problem of “on-the-fly” multipattern matching on compressed
HTTP traffic and solves it.
Present days security is the key for everything it
is considering as major issue in each and every aspect .There
are lots of technologies developed to provide a se currently
one of them is by using “signature-based” detection, can’t
bare stressed traffic, where marketing value is increasing.
This paper developed on the theme of compressed HTTP
traffic. HTTP uses GZIP compression. Decompression phase
is used by HTTP before performing a string matching. In this
a algorithm, Aho–Corasick-based algorithm for Compressed
HTTP (ACCH) was used to provide more advantage than the
commonly used Aho–Corasick pattern-matching algorithm.
The advantage of this is that takes advantage of information
gathered by the decompression phase in order to accelerate.
we explore that it is faster to perform pattern matching on the
compressed data, with the defect of decompression than on
regular traffic. We are the initial one that analyzes the
problem of “on-the-fly” multipattern matching on compressed
HTTP traffic and solves it.
Accelerating Multipattern Matching On Compressed HTTP Traffic
Abdul Mannan Virani, M.Tech CSE Dept, DIMAT, Raipur (C.G.),
Abstract: Present days security is the key for everything it is considering as major issue in each and every aspect .There are lots of technologies developed to provide a se currently one of them is by using signature-based detection, cant bare stressed traffic, where marketing value is increasing. This paper developed on the theme of compressed HTTP traffic. HTTP uses GZIP compression. Decompression phase is used by HTTP before performing a string matching. In this a algorithm, AhoCorasick-based algorithm for Compressed HTTP (ACCH) was used to provide more advantage than the commonly used AhoCorasick pattern-matching algorithm. The advantage of this is that takes advantage of information gathered by the decompression phase in order to accelerate. we explore that it is faster to perform pattern matching on the compressed data, with the defect of decompression than on regular traffic. We are the initial one that analyzes the problem of on-the-fly multipattern matching on compressed HTTP traffic and solves it.
I. INTRODUCTI ON Technologies for security, such as Network Intrusion Detection System (NIDS) or Web Application Firewall (WAF).This deals with signature-based detection techniques to identify attacks. Now a days, security tools is judged by the speed of the underlying string-matching algorithms that detect these signatures .HTTP compression nothing but content encoding is openly available method to compress textual content transferred from Web servers to browsers. Lots of websites and social sites are using this HTTP compression. As per research 25% + industries are using HTTP compression, and increasing. This compressed content is built into HTTP 1.1 and was supported by most browsers. Most current security tools either ignore scanning compressed
Somesh Kumar Dewangan Associate Professor (CSE),DIMATRaipur (C.G.),
traffic, which causes of security holes, or disable the option for compressed traffic by re- producing the client-to HTTP Header to shows that compression is not backed by the clients browser thus decays the complete performance and bandwidth. Less security tools HTTP compressed traffic by decompressing the entire page on the proxy and doing a signature scan on the decompressed page before passing to the client. This option doesnt applicable for security tools that performs at a high speed or when performing additional delay is not an option. In this paper, we explore a novel algorithm, AhoCorasick based algorithm on Compressed HTTP (ACCH). ACCH decompresses the traffic and then uses the data from the decompression phase to accelerate the pattern matching. Specifically, the GZIP compression algorithm works by avoiding repetitions of strings using back-references (pointers) to the repeated strings. Our key insight is to store information produced by the pattern-matching algorithm for the already- scanned decompressed traffic, and if any case of a pointer, use this information to get if it contains a match or one can securely deletes scanning bytes in it. ACCH can skip up to 84% of the data and boost the performance of the multi pattern-matching algorithm by up to 74%. II. PROBLEM STATEMENT
IDS find the Intrusion using known attack patterns called signatures. Every IDS will have more number of signatures ( more than 5000) If Pattern matching algorithm is slow, the IDS attack response time will be very high. The existing efficient algorithms such as Boyer Moore (BM), Aho-Coarasick(AC) does not improve the throughput of IDS. International Journal of Computer Trends and Technology (IJCTT) volume 4 Issue10 Oct 2013 ISSN: 2231-2803 http://www.ijcttjournal.org Page3718
The proposed system is an Implementation of Scalable look-ahead Regular Expression Detection System. Works based on look-ahead Finite Automatic Machine. Improves the detection speed or attack-response time. The proposed system should be capable of processing more number of signatures with more Number of Complex Regular Expressions on every packet payload. The attack response time should be less when Compared with Deterministic Finite Automatic (DFA) Pattern Matching Procedures (aho- coarasick). Should Provide pattern matching with Assertions (back References, look-ahead, look-back, and Conditional sub-patterns). Should use less memory ( Space complexity is low)
III. SYSTEM DEVELOPMENT 1. Packet Capturing 2. Application Payload Extraction 3. HTTP Encoding Header Identification 4. Decompression of Payload 5. Alert Verification in SNORT. Packet Capturing The module opens the network interface card Reads every packet that are received by the Network Interface Card (NIC). Queues all the packets in buffer Application Payload Extraction The buffered packets will be in raw packet format. The Module identifies the headers and payloads at each layer of TCP/IP. Decodes the payload according to their header formats. The Application Payload will be buffered or stored for the next level. HTTP Encoding Header Identification If the packet payload is in the HTTP Protocol format, the module checks for the HTTP Header Accept-Encoding: g zip or Accept-Encoding: deflate Accept-Encoding: chunked Etc.
Presence of Accept-Encoding HTTP Header confirms that the payload is in the encoding format. If the Header is Accept-Encoding: gzip then the payload is in the compressed format.
Decompression of Payload If the payload is in the gzip compression, the payloads has to be buffered for all the incoming packets The buffered payload is processed for gzip decompression. The decompressed data is passed on to the Pattern Matching for attack identification.
Alert Verification in SNORT The SNORT IDS consists of several signatures The signatures cannot be applied on the compressed data. The decompressed payload can be used on to the SNORT. The module confirms that, the signatures are not hitting for the compressed data and hitting for the decompressed data.
Multi pattern Matching: Pattern matching has been a topic of intensive research resulting in several approaches; the two fundamental approaches are based on the AhoCorasick (AC) and the BoyerMoore algorithms. In this paper, we illustrate our technique using the AC algorithm .The basic AC algorithm constructs a deterministic finite automaton (DFA) for detecting all occurrences of given patterns by processing the input in a single pass. The input is inspected symbol by symbol (usually each International Journal of Computer Trends and Technology (IJCTT) volume 4 Issue10 Oct 2013 ISSN: 2231-2803 http://www.ijcttjournal.org Page3719
symbol is a byte), such that each symbol results in a state transition. Thus, the AC algorithm has deterministic performance, which does not depend on the input, and therefore is not vulnerable to various attacks, making it very attractive to NIDS systems. Each arrow indicates a DFA transition made by a single byte scan. The label of the destination state indicates the scanned byte. If there is no adequate destination state for the scanned byte, the next state is set to root. For readability, transitions to root were omitted. Note that this common encoding requires a large matrix of size , where is the set of ASCII symbols and is the number of states) with one entry per DFA edge. In the typical case, the number of edges, and thus the number of entries, is 256[s].
Fig :AhoCorasick DFA for patterns
For example, Snort patterns used in Section VII require 16.2 MB for 1202 patterns that translate into 16 649 states. There are many compression algorithms for the DFA, but most of them are based on hardware solutions. At the bottom line, DFAs require a significant amount of memory; therefore they are usually maintained in main memory and characterized by random rather than consecutive accesses to memory. Challenges faced in multi pattern matching 1) Remove the HTTP header and store the Huffman dictionary of the specific session in memory. Note that different HTTP sessions would have different Huffman dictionaries. 2) Decode the Huffman mapping of each symbol to the Original byte or pointer representation using the specific Huffman dictionary table. 3) Decode the LZ77 part. 4) Perform multi-pattern matching on the decompressed Traffic. Space: One of the problems of decompression is its memory requirement: The straightforward approach requires 32 kB sliding window for each HTTP session. Note that this requirement is difficult to avoid since the back-reference pointer can refer to any point within the sliding window and the pointers may be recursive unlimitedly (i.e., pointer may
Algorithm1 Native Decompression with Aho Corasick pattern matching
Trf the input, compressed traffic (after Huffman decompression) SWin 1 32KB the sliding window of LZ77, where SWin j is the information about the uncompressed byte which is located bytes before current byte FSM(state,byte) AC FSM receives state and byte and returns the next state, where startStateFSM is the initial FSM state Match(state) if state is match state it stores information about the matched pattern, otherwise NULL
1: state = function scanAC(state) 2: state=FSM (state,byte) 3: if Match (state) / NULL then 4: act according to Match (state) 5: end if 6: return state 7: procedure ZIPDecompressPlusAC(Trf 1 ,.., Trf n ) International Journal of Computer Trends and Technology (IJCTT) volume 4 Issue10 Oct 2013 ISSN: 2231-2803 http://www.ijcttjournal.org Page3720
8: state=startStateFSM 9: for =1 to n do 10: if Trf i is pointer (dist,len) then 11: for j =0 to length-1 do 12: state scanAC (state, SWin dist j )
13: end for 14: update SWin with bytes SWin dist dist len 15: else 16: state=scanAC(state, Trf i ) 17: update SWin with the byte Trf i
18: end if 19:end
point to area with a pointer). Indeed the distribution of pointers on the real-life data set (see Section VII for details on the data set) is spread across the entire window. On the other hand, pattern matching of non compressed traffic requires storing only one or two packets (to handle cross- packet data),where the maximum size of a TCP packet is 1.5 kB. Hence, dealing with compressed traffic poses a higher memory requirement by a factor of 10. Thus, mid-range firewall, which handles 30 K concurrent sessions, requires 1 GB memory, while a high-end firewall with 300 K concurrent sessions requires 10 GB. This memory requirement has implication on not only the price and feasibility of the architecture, but also on the capability to perform caching. The space requirement is not in the focus of this paper. Still, recent work by Afek et al. has shown techniques that circumvent that problem and drastically reduce the space requirement by over 80%, with only a slight increase in time. It has also shown a method to combine that technique with ACCH, which achieves improvements of almost 80% in space and above 40% in the time requirement for the overall DPI processing of compressed Web traffic. Time: Recall that pattern matching is a dominant factor in the performance of security tools, while performing decompression further increases the overall time penalty. Therefore, security tools tend to ignore compressed traffic. This paper focuses on reducing the time requirement by using the information gathered by the compression phase. We note that pattern matching with the AC algorithm requires significantly more time than decompression since decompression is based on consecutive memory reading from the sliding window, hence it has low read-per-byte cost. On the other hand, the AC algorithm employs a very large DFA that is accessed with random memory reads, which typically does not fit in cache, thus requiring main memory accesses. Appendix A introduces a model that compares the time requirements of the decompression and the AC algorithm. Experiments on real data show that decompression takes only a negligible 3.5% of the time it takes to run the AC algorithm. For that reason, we focus on improving AC performance. We show that we can reduce the AC time by skipping more than 70% of the DFA scans and hence reduce the total time requirement for handling pattern matching in compressed traffic by more than 60%. IV. OVERVIEW OF SYSTEM ARCHITECTURE: Packet capturing modules receives every packet. Payload extraction module, extracts the application layer packet. Using time stamps module (TLM) each incoming character is cross checked against non-repetition types of variable strings. Character look up module (CLM) is responsible for identifying frequently access character strings. Repetition Detection module is responsible for identifying repetition that are not detected by CLM Frequently appearing repetition module(FRM) it Reduces Resource usage by creating opportunity for sharing Effort of Frequent bases. VIII. RELATED WORK As per M. Fisk and G. Varghese, An analysis of fast string matching applied to content-based forwarding and intrusion detection, Tech. Rep. CS2001-0670 (updated version), 2002.- Pattern matching is one of the most performance critical International Journal of Computer Trends and Technology (IJCTT) volume 4 Issue10 Oct 2013 ISSN: 2231-2803 http://www.ijcttjournal.org Page3721
components in network intrusion detection and prevention system, which needs to be accelerated by carefully designed architectures. In this paper, we present a highly parameterized multilevel pattern matching architecture (MPM), which is implemented on FPGA by exploiting redundant resources among patterns for less chip area. In practice, MPM can be partitioned to several pipelines for high frequency. This paper also presents a pattern set compiler that can generate RTL codes of MPM with the given pattern set and predefined parameters. One MPM architecture is generated by our compiler based on Snort rules on Xilinx FPGA. The results show that MPM can achieve 4.3Gbps throughput with only 0.22 slices per character, about one half chip area than the most area-efficient architecture in literature. MPM can be parameterized potential for more than 100 Gbps throughput. The problem of pattern matching on compressed data has received attention in the context of the LempelZiv compression family. However, the LZW/LZ78 are more attractive and simple for pattern matching than LZ77. HTTP uses LZ77 compression, which has a simpler decompression algorithm, but performing pattern matching on it is a more complex task that requires some kind of decompression (see Section II). Hence, all the above works are not applicable to our case. Klein and Shapira suggest modification to the LZ77 compression algorithm to make the task of the matching easier in files. However, the suggestion is not implemented in todays HTTP .References M. Farach and M. Thorup, String matching in Lempel-Ziv compressed strings, in Proc. 27th Annu. ACM Symp. Theory Comput.,1995, pp. 703712 and L. Gasieniec, M. Karpinski, W. Plandowski, and W. Rytter, Efficient algorithms for lempel-ziv encoding (extended abstract), in Proc. 4th Scandinavian Workshop Algor. Theory, 1996, pp. 392403. and are the only papers we are aware of that deal with pattern matching over LZ77. However, in those papers, the algorithms are for a single pattern and require two passes over the compressed text (file), which is not applicable for network domains that requires on-the-fly processing .One outcome of this paper is the surprising conclusion that pattern matching on compressed HTTP traffic, with the overhead of decompression, is faster than pattern matching on regular traffic .We note that other works with the context of pattern matching in compressed data such as U. Manber, A text compression scheme that allows fast searching directly in the compressed file, Trans. Inf. Syst., vol. 15, no. 2, pp.124136, Apr. 1997 and N. Ziviani, E. de Moura, G. Navarro, and R. Baeza-Yates, Compression: A key for next-generation text retrieval systems, Computer, vol.33, no. 11, pp. 3744, 2000and have shown a similar conclusion, stating that compressing a file once and then performing pattern matching on the compressed file accelerates the scanning process.
Algorithm 2 ACCHOptimization II
absPosition Absolute position from the beginning of data. After line 38: absPosition+=len After line 49: absPosition++ MatchTable a hash table, where each entry represents a Match. The key is the Match absPosition,and the value is a list of patterns that where located at the position. Function scanAC a new line is added after line 4: add patterns in Match (state) to MatchTable(absPosition) Procedure ACCH instead of the while loop, lines (41-47): handleInternalMatches(state,curPos,len-1) scanSegment(state,curPos,len-1) Function scanSegment should ignore Matches found by scanAC same all matches within pointer are located by functions scanleft and handleInternalMatches.
1: function handleInternalMatches(start,end) 2: for curPos = start to (end) do 3: if .status= Match then
International Journal of Computer Trends and Technology (IJCTT) volume 4 Issue10 Oct 2013 ISSN: 2231-2803 http://www.ijcttjournal.org Page3722
4: if MatchTable(curPos) contains patterns shorter or equal to curPos then 5: add those patterns to MatchTable(absPosition) 6: curPtrInfo[curPos].status = Match 7: else curPtrInfo[curPos].status =Check 8: end if 9: else 10: curPtrInfo[curPos].status= .status 11: end if 11: end for Algorithm2: ACCHOptimization II CDepth1,CDepth2 Instead of one constant parameter CDepth, we maintain two, where CDepth1 < CDepth2 Function scanAC line 8 changes to lines: Else if Depth(status) CDepth1 then status = Uncheck1 else status = Uncheck2 Function scanSegment line 21: instead of searching for maximal Uncheck, it searches for maximal Uncheck1 or Uncheck2 Function scanSegment lines 23-30: CDepth parameter changes to CDepth1 or CDepth2 depending on whether the state found on line 21 is Uncheck1 or Uncheck2,respectively
V. CONCLUSION Now a days almost each and every one modern security tool is a pattern matching algorithm. Web traffic is completely based on HTTP compression. Normally security tools ignore this traffic and leave security holes. In another case it neglects the parameters of the connection, it leads dangerous situation to the performance and bandwidth of client side and server side .Our algorithm eliminates up to 84% of data scan based on information stored in the compressed data. Unexpectedly, it is faster to perform pattern matching on compressed data with the effect of e compression, rather than pattern matching on uncompressed traffic. We have to observe that ACCH is not intrusive for the AC algorithm, so all the methods that improve AC DFA are orthogonal to ACCH and are applicable. We are the first paper that analyzes the problem of on-the-fly multi pattern-matching algorithm on compressed HTTP traffic and suggests a solution.
REFERENCES
[1] M. Fisk and G. Varghese, An analysis of fast string matching applied to content-based forwarding and intrusion detection, Tech. Rep.CS2001-0670 (updated version), 2002. [2] Port80, Port80 Software, San Diego, CA [Online]. Available: http://www.port80software.com/surveys/top1000compression [3] Website Optimization, LLC, Website Optimization, LLC, Ann Arbor, MI [Online]. Available: http://www.websiteoptimization.com [4] P.Deutsch, Gzip file format specification, RFC 1952, May 1996 [Online].Available: http://www.ietf.org/rfc/rfc1952.txt [5] P. Deutsch, Deflate compressed data format specification, RFC 1951, May 1996 [Online]. Available: http://www.ietf.org/rfc/rfc1951.txt [6] J. Ziv and A. Lempel, A universal algorithm for sequential data compression, IEEE Trans. Inf. Theory, vol. IT-23, no. 3, pp. 337343, May 1977. [7] D. Huffman, A method for the construction of minimum- redundancy codes, Proc. IRE, vol. 40, no. 9, pp. 10981101, Sep. 1952. [8] Zlib, [Online]. Available: http://www.zlib.net [9] A. Aho and M. Corasick, Efficient string matching: An aid to bibliographic search, Commun. ACM, vol. 18, pp. 333340, Jun. 1975. [10] R. Boyer and J. Moore, A fast string searching algorithm, Commun.ACM, vol. 20, no. 10, pp. 762772, Oct. 1977. [11] N. Ziviani, E. de Moura, G. Navarro, and R. Baeza-Yates, Compression:A key for next-generation text retrieval systems, Computer, vol.33, no. 11, pp. 3744, 2000. International Journal of Computer Trends and Technology (IJCTT) volume 4 Issue10 Oct 2013 ISSN: 2231-2803 http://www.ijcttjournal.org Page3723
[12] T. Song, W. Zhang, D. Wang, and Y. Xue, A memory efficient multiple patternmatching architecture for network security, in Proc. IEEE INFOCOM, Apr. 2008, pp. 166170. [13] J. van Lunteren, High-performance pattern-matching for intrusion detection,in Proc. IEEE INFOCOM, Apr. 2006, pp. 113. [14] V. Dimopoulos, I. Papaefstathiou, and D. Pnevmatikatos, A memoryefficient reconfigurable AhoCorasick FSM implementation for intrusion detection systems, in Proc. IC- SAMOS, Jul. 2007, pp. 186193. [15] N. Tuck, T. Sherwood, B. Calder, and G. Varghese, Deterministic memory-efficient string matching algorithms for intrusion detection,in Proc. IEEE INFOCOM, 2004, vol. 4, pp. 26282639. [16] M. Alicherry, M. Muthuprasanna, and V. Kumar, High speed patternmatching for network ids/ips, in Proc. IEEE ICNP, 2006, pp.187196.
First Author: Abdul Mannan Virani received his B.E. (CSE) degree from RCET Bhilai, Pandit Ravi Shankar Shukla University (Pt.RSU) Raipur, in 2006.From 2006 to 2010 he worked in various multinational companies as Consultant and Customer support .He is currently an M.Tech student in the Computer Science Engineering from DIMAT Raipur, Chhattisgarh Swami Vivekananda University Bhilai. His Research interests are in the areas of Wireless and Network Security, with current focus on secure data services in cloud computing and secure computation outsourcing.
Second Author: Somesh Kumar Dewangan received his M. Tech in Computer Science and Engineering from RCET Bhilai, Chhattisgarh Swami Vivekananda University Bhilai , in 2009. Before that the MCA. Degree in Computer Application from MPBO University, Bhopal, India, in 2005. He is lecturer, Assistant Professor, Associate professor, Disha Institute of Management and Technology, Chhattisgarh Swami Vivekananda Technical University Bhilai, India, in 2005 and 2008 respectively. His research interests include digital signal processing and image processing, Natural Language Processing, Neural Network, Artificial Intelligence, Information and Network Security, Mobile Networking and Cryptography and Android based Application.