Sunteți pe pagina 1din 11

ABOUT PICALO Picalo is a data analysis application, with focus in fraud detection and data retrieved from corporate

databases. It is also the foundation for an automated fraud detection system (see below). It started in 2000 as a pet project of Conan C. Albrecht, a professor at BYU. Picalo is currently focused on data analysis for fraud and corruption detection. However, it is an open framework that could actually be used for many different types of data analysis: network logs, scientific data, any type of database-oriented data, and data mining. Picalo is built upon a three-level architecture, including open source and potentially closed source parts. The following diagram describes the different parts of the Picalo platform.

Level 1 Routines: This open source level includes all the basic data structures in Picalo. It was finished in the 2001-2002 time frame, and while we continue to add some routines to it, is quite stable and finished. Level 2 Routines: This level allows non-technical people to run advanced analyses without having to script or write programs. Picalos pluggable architecture allows new detectlets to be installed in the existing program. We hope that detectlet libraries will be made available by individuals and/or companies the license and price is up to the developers. Level 3 Routines: When developed, this level will apply expert system rules to intelligently run level 2 routines (detectlets) to discover many different types of fraud on its own. The license for the expert system is not decided at this point. Cross Platform Graphical User Interface: This open source layer provides a GUI on top of the Level 1 routines to allow non-technical users to access the heart of Picalo. It provides dialog-access to most Picalo features. Detectlet Wizard: This open source wizard provides access to all detectlets installed at your location. It guides users through the use and application of detectlets in fraud analysis.

Detectlets are one of the most exciting parts of the Picalo architecture. They allow nonprogrammers to run analysis routines created by others. See the detectlets page for more information. Picalo is built upon the shoulders of many great projects. Thousands of individuals have contributed time and energy to these projects, and the Picalo effort is grateful for their work. These are listed as follows:

Python wxPython mxDateTime Python Windows Extensions Statistics package pstat.py by Gary Strangman Nuvola Icon Set by David Vignoni

What about quality control? Another way to phrase this question is, Can I trust Picalos results? The short answer is you can trust Picalo as much as any analysis application. We take quality control very seriously.

The long answer is you should never fully trust any analysis application. You should always double check each step of the process, print control totals, and manually ensure that your routines are doing what you think they are. It can be very embarrasing (and dangerous) to make decisions on faulty analysis routines. Users unfamiliar with the open source world may implicitly trust corporate software and be wary of open software. We hope you will re-evaluate this common misconception as you use Picalo. Certainly, closed-source software applications are often more user friendly than community-built applications. But good looking and easy-to-use programs are not necessarily trustworthy. As more users test Picalo and more developers help program it, well have a lot of eyes looking through the code and testing the routines. Open source software often finds and fixes bugs much faster than closed-source software because of the number of individuals looking at its code. Corporate software is often written by small development teams who are driven by marketing calendars and new features. The open source world has many examples of incredibly well-written software, including Linux (widely known for crashing very rarely), PostgreSQL (a highly-respected database), Apache (which runs most of the web), Bind (which runs the domain names on the Internet), KDE and Gnome (excellent user interfaces that look similar to Windows), wxWidgets (the GUI toolkit Picalo is built upon), Perl and Python and GCC (programming languages many real programs are written with), LaTeX (a great word processing platform), and Firefox (an incredible web browser). This list could go on with thousands of successful open source products that are in production use today. In summary, is Picalo perfect? Of course not. There may even be one or two bugs left in the Level 1 routines of Picalo. But were working to make it a world-class analysis application that encodes information about thousands of fraud schemes used worldwide. Consider helping out to be part of the team that makes Picalo better and better. Most of Picalo is released under the GNU General Public License (GPL). The GPL is a restrictive, open source license. Weve released this package under the GPL to protect it. The source code comes with Picalo, and you are encouraged to improve and add to its funcationality. Detectlet libraries can be released under licenses other than the GPL. Since Detectlets will be built by companies, organizations, and individuals, it is up to the developers to decide whether to sell, open source, or even public domain their routines. The restriction is that you cannnot use any Picalo code in your own products unless those products are also released under the GPL. If you are using a closed-source license or even one of many incompatible open source licenses, you cannot use Picalo code. The license protects the code from being stolen by any individual or company. This short description is obviously an overgeneralization of the license; please see the LICENSE.TXT file for more information.

Picalos Main Screen


Saturday, October 24th, 2009 Categorized Under: Slide Roll

Above is the Windows version of Picalo. The main screen is split into three areas:

The project browser (left side) shows the tables, database connections, saved queries, and scripts available in this project. These correspond to a directory on the users machine. The work area (top tabs) shows the tables or scripts that are currently open and in memory. It provides a spreadsheet-type view of the data and filter capabilities along the top. The shell (bottom tabs) provides a direct interface to the Picalo engine. As the user interacts with tables, menu options, etc., Picalo shows the actual commands being run behind the scenes. The user can simply use the graphical menus or type commands directly into the shell. The Script Output and History tabs provide feedback from scripts as well as a full history of everything done in the project.

Picalo runs on Windows, Mac, and Linux. The codebase is identical across the operating systems, and the menus and interface work exactly the same. Picalo can also run in text-only mode with no graphical user interface at all. This allows it to be embedded into other systems. The following graphics show the Mac and Linux versions of Picalo:

Picalo can connect to most relational databases. It can also load data from text files (CSV, EBCDIC, fixed width, etc.) and from email applications. It provides a visual query editor (although wed like to make it better) and can send any type of SQL to the engine of your choice. As of 2010, Picalo ships with the following drivers:

ODBC (pyodbc) Connects to the operating systems ODBC subsystem, which means it can talk with any database that has an ODBC driver (which means nearly all relational datbases in existence). SQLite (sqlite3) An embedded database in Picalo that you are free to use. SQLite is one of the most popular light databases available today. It provides all the power of SQL without the need for a separate engine. Just point it at a directory and start querying! MySQL (mysqldb) A direct driver to connect to MySQL databases without the need for ODBC. PostgreSQL (pygresql & psycopg2) Two drivers are available to connect to PostgreSQL databases without the need for ODBC. Oracle (cx_Oracle) A direct driver to connect to Oracle without the need for ODBC. This only works on the Windows version right now.

Comments are closed.

Importing Data
Saturday, October 24th, 2009 Categorized Under: Slide Roll

Picalo can load data from a variety of text files. These data come into Picalo as tables. You can work with them immediately, or you can push them up to a relational database. As of 2010, Picalo imports the following types of files:

Delimited text files like Comma Separated Values (CSV), Tab Separated Values (TSV), and others. Fixed width files from legacy systems. The import wizard provides a visual way to delimit where fields start and stop: just click with your mouse to denote a new field. MS Excel spreadsheets in the traditional .xls format. The new .xlsx is not supported. Email files in both Maildir and Mbox formats. If you need to read proprietary systems like Exchange, simply convert the .pst files (which contains user email files) to Maildir format using one of the many free tools available online.

Picalo works with many different file encodings, including ASCII, EBCDIC, and the many Unicode encodings. Comments are closed.

Language Support
Saturday, October 24th, 2009 Categorized Under: Slide Roll

Picalo supports a number of languages, including English, two types of Spanish, Portuguese, and Italian. We also included a few fun languages for testing purposes. If you have a few hours, please translate Picalo into your language! Comments are closed.

Function Composer
Saturday, October 24th, 2009 Categorized Under: Slide Roll

Picalo contains a large library of functions that help users analyze their data. The Function Composer (pictured above) lists all the available functions with their documentation, parameters, and return values. The filter wizard helps users filter tables using exact values, simpler patterns and wildcards, or full-power regular expressions. This is shown below:

Comments are closed.

Detectlets
Saturday, October 24th, 2009 Categorized Under: Slide Roll

Scripting in Picalo is great, but the reality is that most users will go only as deep as the menus do. Thats where detectlets come in: they allow you to easily add a wizard interface to your scripts. With detectlets, script-savvy users can share their work with others in the group, who see the scripts as regular menu items with a nice wizard interface. Right now were primarily using detectlets to encode fraud detection routines (hence the name). The goal is to create a worldwide repository of detectlets that all investigators can contribute to and download from. Right now all submitted detectlets are shipped with Picalo, but well develop an online community (with ratings, comments, etc.) as the number of detectlets grows.

Scripting in Picalo
Saturday, October 24th, 2009 Categorized Under: Slide Roll

Picalo is all about automation. The software developer believes strongly that fraud detection is best done through scripts. Scripts provide documentation, continual improvement and reuse of routines, and most important, speed. Picalo uses the excellent Python programming language, rated by Tiobe as one of the most popular languages in the world. This provides a full-featured language as compared with the simple macro environment most analysis applications provide. Python is object-oriented yet easy to learn and use. It provides hundreds of libraries within the Picalo environment (to scrape web sites, work with data, connect to email, etc.), and thousands more libraries can be downloaded online. If youve gotten into the scripting language of your current platform (such as VBA in MS Office), youll love the power and flexibility Python gives you.

S-ar putea să vă placă și