Sunteți pe pagina 1din 8

Suggestions for Writing Better SAS® User Group Papers

Thomas E. Billings, MUFG Union Bank, N.A., San Francisco, California

This work by Thomas E. Billings is licensed (2014) under a


Creative Commons Attribution 4.0 International License.

Abstract

®
Where do the ideas for SAS -related papers come from? How does a tentative topic idea evolve into a paper? What
are some common errors to avoid when writing a paper? Is redundancy of topic a concern? Are SAS User Group
conferences the only venues to publish papers on SAS-related topics? Is there a role for SAS in the fast-growing
reproducible research movement? Publishing is currently (2014) undergoing rapid change in our digital world; how
might that impact SAS-related papers? These topics and more are covered in this article. The focus of this article is
on writing papers; presentations are only briefly mentioned.

Introduction & Background

®
If you have downloaded and read some of the SAS User Group (SUG) conference papers available via
lexjansen.com, sas.com, and/or the websites of the various SAS User Groups, you may have noticed a large
variation in quality of the papers. The best SUG papers are clear, well-written, and informative. Some papers may
include material that is redundant with previously published papers but at the same time also provide significant new
information and examples.

This article is intended for those who are considering writing a paper for a SAS User Group conference, and it may be
useful for some who have written/are writing SAS-related papers for conferences and other venues. To write a SAS-
related paper you need an appropriate topic, a strategy for how to proceed, and a plan of action. These are discussed
in the sections that follow.

The information presented here reflects the author’s writing experience, and should be viewed as opinion. Other
approaches are possible; there is no “one ideal way” to write a research paper. The goal of this paper is to present
information on the paper research process, in the hope that some readers will start writing papers for their SUGs.

Where do paper ideas come from?

The very first thing you need is a topic for your paper. Ideas for paper topics may come from a number of sources,
including but not limited to:

 Problems/challenges you directly encounter in your work.


 Problems/challenges you indirectly encounter in your work, i.e., challenges your colleagues bring to your
attention.
 Concerns/curiosity sparked from reading technical papers; these may be SAS-related or may come from any
scientific field – e.g., a question that arises while reading SAS documentation or a scientific/academic
journal.

Additionally, it is becoming more common for authors of peer-reviewed articles to post the raw data for their research
online, where others can access and use the data. If you have sufficient statistical skills and if the topic/data are of
interest to you, then you can download published data, import it into SAS (or any other statistical software) and
reanalyze the data. If you do this, your reanalysis should not be limited to merely recreating the original research but
should provide at least some additional, new information. The degree of effort involved here can vary, depending on

1
the structure, format, and cleanliness of the raw data, and on the complexity of the additional analyses to be
performed.

How does a topic idea evolve into a paper?

When an idea comes to you, make a note of it in a word processor or text editor file. At this stage the topic may be in
the form of a proposed paper title, a question, or a few sentences that provide a statement of the topic. The
description is usually brief and may be somewhat vague to start.

Preliminary research. The next step is to perform preliminary research on the proposed paper topic. The objective
of the preliminary research is to clarify the topic and determine if the topic has already been addressed and if so, to
what extent. To accomplish this, do searches for relevant keywords on the following websites:

 lexjansen.com: http://lexjansen.com/
 Google advanced search, restricted to the site sas.com: http://www.google.com/advanced_search?hl=en
 sascommunity.org: http://www.sascommunity.org/wiki/Main_Page
 SAS-L archives: http://listserv.uga.edu/archives/sas-l.html
 if appropriate, a Google general web search for references that are not SAS-specific.

Download relevant documents that you find in the preliminary search or paste the URLs for relevant documents in the
word processing file, along with the paper topic. For most topics this is an iterative process similar to web spiders,
i.e., one reference may lead you to other, related references. If the research includes computer programming, the
scope of work required should be estimated in this stage.

Once you have a few references, skim them and/or read the relevant sections. The end products from this stage
should be:

 a clear, specific statement of the paper topic,


 a tentative paper title (which you will probably change later),
 a brief outline of the planned paper,
 if the paper requires SAS and/or other computer language code, a high-level description of the code and
examples needed plus an estimate of the development effort, and
 an initial list of references.

Any or all of the above can and probably will be modified later, during the in-depth research stage.

If your paper is code-centric and not-too-complex, there is an alternative approach that might be faster and simpler.
After some preliminary research, develop sample data sets and code that illustrates the topic of your paper. The
resultant code and sample output then becomes the core of your paper, and you start writing using that as a basis.
This approach works best for Quick Tips/Coder’s Corner section papers.

Like computer programs, research papers change and evolve over time as your work uncovers previously unknown
nuances and related issues. For those familiar with project management terminology, writing a research paper is
usually done as an iterative process and not via a waterfall or cascade process.

Common errors:
Is my topic redundant?

In this context redundancy means that the topic you are interested in has been covered to at least some extent in
other published papers and/or in the SAS documentation. Submitted abstracts may be rejected by SUG Section
Chairs if the topic is already covered in multiple papers presented at previous SUGs.

Redundancy is normal/to be expected in certain paper types, e.g., tutorials, papers written for hands-on-workshops,
review papers, and what might be called a highlight paper. A highlight paper has a narrow focus on a few features of
the SAS system, and serves to provide examples of the feature and/or to clarify the SAS documentation for the
subject features.

2
Nearly every technical paper has some sections with information that is repeated elsewhere in other papers or the
SAS-system documentation, so some level of redundancy is unavoidable. The key issue is whether a paper with
significant redundancy also provides value added – e.g., new information and/or examples to clarify an aspect of the
SAS system. If the paper provides value added, then redundancy is less of a concern and chances of acceptance are
higher.

Note that a paper may be presented at more than one SUG conference. This too is redundant but easily ignored, i.e.,
just use the most recent version of the paper.

Applying the suggestions above to this article, there are 3 earlier conference papers on the same topic (writing
papers for SUGs): Carroll (2010), Rhodes (2004), and Loren (1994). Those papers provide extensive discussion of
doing presentations at SUGs, and to avoid redundancy, this article provides only a brief mention of presentations.
This article also provides new material not covered in the earlier papers.

Common errors:
Not discussing how SAS provides many ways to do things

It is well-known that SAS provides many different methods to accomplish any of a wide range of tasks. The research
for your paper should recognize this and – when relevant – identify other methods of accomplishing the same result
in SAS. The reasons why the method you discuss is optimal, or under what conditions the method is optimal, should
be explicitly covered in your paper.

There is an obvious limitation on discussing alternative methods, namely that you can’t test a method that is provided
only in a SAS system component that is not installed at your site. In that case you can mention the method in
passing and report that you can’t test it as it is not available at your site.

This issue is a special concern for papers with a focus on PROC REPORT, though it can also apply to the DATA step
and other PROCs. Papers have been written and presented at SUGs describing elaborate macro-based applications
of PROC REPORT, where the same result can easily be accomplished with user-defined formats and a simple
invocation of PROC TABULATE.

Next step:
In-depth research

The in-depth research is a continuation and extension of the preliminary research. Here you review all of the
references found previously and also any additional references found after the preliminary stage. If your paper
requires code and data examples, then:

 new code and data examples are developed and run in this stage, and/or
 if you already have existing code and data examples, they should be reviewed and modified if needed to
protect confidential data.

You should expect to go through multiple iterations of the program code/example data sets before settling on a final
version for the paper. The products of this stage include detailed notes for the points to be covered in the paper text,
plus sample code, log, and other program outputs.

Strategy:
How much work to complete on your paper before submitting an abstract?

A typical timeline for a SUG conference paper is (approximately):

 Abstract submission – open for a month or more,


 Topic acceptance or rejection – maybe a month after abstract submission closes,
 Paper due – 1-2 months after acceptance

3
 Conference and presentation – 1-2 months after the paper is due.

The minimum information required for submission is a paper title and an abstract. At the time this is written (Feb.
2014) some – and for SAS Global Forum (SGF), most – submitted abstracts are rejected, although acceptance rates
at other SUGs are often higher than for SGF. If your paper is rejected by one SUG, you can submit to another SUG
or alternately, publish it on sascommunity.org or your own website/blog.

The question then arises: how much work should you do on your paper, before submitting an abstract? To answer
this question, consider these factors:

 whether the paper topic is a good fit for the target SUG, based on accepted papers for the last SUG
conference
 how interesting the topic is to you;
 how much you enjoy (or don’t enjoy) the research/writing process;
 when you will have time to work on the paper – now or later;
 if you have submitted and presented papers at SUGs in the past, your acceptance rate for those papers;
 if the abstract is not accepted, whether submitting the abstract to other SUGs is a possibility or the default
option of publishing the paper on sascommunity.org and/or your own website is an acceptable alternative.

Timing is also a factor, i.e., if you come up with a great paper idea 2 weeks before abstract submission closes, you
probably won’t have sufficient time to write the paper before submitting the abstract. You should consider all of the
above factors and make a decision that is tailored to your situation.

Along these lines, the following suggestions reflect this writer’s personal opinion. If being rejected is not a problem as
you will publish the paper elsewhere, then there is little or no risk in completing the entire paper before abstract
submission. If you are an experienced writer with a high acceptance rate for previous papers then there is a low risk
in completing the entire paper before abstract submission.

If you don’t want to do all of the work required to complete a paper unless it is accepted, then you should at least
complete the work described above in the “Preliminary research” subsection before submitting an abstract. That work
will help determine the feasibility of the topic, i.e., whether the topic is likely to yield a reasonable paper, and provide
a sufficient basis to write a reasonable abstract.

Writing the paper:


An intermediate stage

Review the notes and program artifacts from the previous stages and revise the outline to reflect changes if needed.
Then, using the notes and program output, write the first draft of the text for the paper. The introduction can be the
most challenging section to write, so this can be done last if you prefer (of course it can also be done early, with
iterative improvements in the research stage). The next major stage is reviews, but first some details need to be
addressed.

Details

Copyright notice for articles. At the present time, authors whose papers are accepted have to submit a copyright
grant form that authorizes the SUG to publish the paper in the proceedings. This is actually not necessary when
authors use Creative Commons attribution copyright licenses for their papers (although the SUG may require a
copyright grant form anyway); see Billings (2013) for a discussion of these licenses which are designed to promote
sharing of information. (Note that this article is released under a Creative Commons attribution license.)

Regardless of whether you use a Creative Commons license or not, the first page of your article should have a clear,
explicit copyright notice on it, for example:

Copyright YYYY(year) by NAME_OR_COMPANY_NAME, specify license here – Creative Commons link or


“all rights reserved” or other license/link.

4
Not only does the above provide an explicit copyright statement, it includes the year of publication, which is needed
for paper citation in the common form: Author Name (YYYY).

Copyright notice for code. Creative Commons advises against using their licenses for computer code. Instead, a
license developed specifically for computer software is suggested. A number of open source/free software licenses
are available and the Reproducible Research Standard (Stodden 2009A, 2009B) recommends using a Berkeley
Systems Distribution (BSD) license for releasing source code. The BSD-2-Clause is the latest version (as-of Feb.
2014) of the BSD license; the template for the license is at http://opensource.org/licenses/BSD-2-Clause. To use the
license, replace OWNER and YEAR in the template with the relevant values and include the license text as
comment(s) in your code. (The license text includes a semicolon (;) so use /* and */ to mark it as a comment in your
SAS code.) Additional information on open source/free software licenses is available on the Open Source Initiative
website, http://www.opensource.org.

References. There are standards for references in academic journals and books, with a list of approved
abbreviations for journal names. These standards are a relic of print media and reflect the need to use as little space
as possible for references in printed books and journals. That was then, this is now: it is well-known that print media
are obsolete and space is not an issue on the web.

It is strongly recommended that you ignore some of these obsolete standards, specifically: do not use abbreviations
for journal names. Instead use the full name and provide a link to the abstract (if available in a separate URL) or the
table of contents (TOC) for the cited journal volume/issue. If full text is freely available online, link to that instead of or
in addition to the abstract/TOC. Books cited should include a link to the relevant Google Books record. If you cite a
large number of books, consider the option of including a link to WorldCat, a book search engine for libraries
worldwide: http://www.worldcat.org/

Another standard specifies that when URLs are cited, each citation includes the phrase (Accessed DATE.) This is
overkill; instead a single blanket statement covering all URLs cited can be made, e.g.:

URLs below were accessed in the period date1 – date2,

or a similar notice. If you are concerned that the URL for a key reference will vanish in the future, you can check if
there is a mirror copy on archive.org, i.e., the Wayback machine at: https://archive.org/, and include the URL for the
mirror copy in the citation.

Semifinal:
An iterative review process

Once you have a draft, review it every day for 2-4+ days, for:

 Clarity – does it accurately convey the target information?


 Redundancy – in words and sentences,
 Reading flow – does it read with a smooth flow from one topic to another?

Then set it aside for a few days and come back and review again. This review process may produce surprises as
despite working from detailed outlines, as a result of these reviews I have changed the order of sections, moved
paragraphs from one section to another section, and made major changes to provide a better reading flow. After a
few more days of reviewing, set the paper aside for at least a week before re-reviewing.

Once your draft is relatively clean with reasonable reading flow, it is time to send it out for review by experienced SAS
programmers, some of whom preferably have experience in writing SUG papers. Pay attention to any/all reviewer
comments that you receive. You don’t have to adopt every suggestion they make, but at least give their suggestions
serious consideration.

5
Final stage:
Finishing the paper

The final stage is to update the paper as needed to reflect review comments, to add page numbers (if needed), check
that you have included the SAS trademark notice and trademark ® symbols, and for SUG papers: make a pdf version
of the paper and upload it to the conference website. Your paper is done, and preparing the conference presentation
is the next step.

Future directions

At the time this is written (Feb. 2014), some of the SUG copyright and publication policies are outdated and in need of
updating. Some of the current trends in digital publishing that – at some point – may impact future SUG publication
practices are as follows.

PDF is obsolete. The pdf format presents a paradox: it is a standard in digital publishing for scientific journals and
books, while at the same time tied to print format and decidedly obsolete. The pdf format performs poorly on small
screens (smartphones, tablets) on which a significant volume of internet access occurs nowadays. Copy and paste
from pdf is difficult at best (non-functional for some pdf files), making sharing more difficult. In short, pdf is clumsy and
hard to work with.

Other formats are better suited to small screens: the open source and free epub (electronic publication) format, a
free e-book standard from the International Digital Publishing Forum; html; and the proprietary Amazon Kindle format
mobi6/kf8. SAS ODS, starting in release 9.4, offers support for epub files. Note that epub files can be converted to
kf8 (and vice-versa) using free, open source programs, e.g., the Calibre e-book management system.

Unfortunately, the SUG practice of using pdf as the primary format for research papers actually makes the material
less accessible to potential readers. PDF represents the past; the future is epub and other formats that work well on
small screens (html).

Presentations can be on video. Video can be recorded of your paper presentation or even better, as it is easier to
edit, slideshows for presentations can be put on video with a voiceover by the authors. These videos can be
uploaded to YouTube and/or other video sharing sites, for free public access.

Authors don’t need permission from a SUG to upload a video made outside of a SUG conference. However, if your
paper has been accepted for a SUG conference, common courtesy suggests that you wait until after the conference
to post your video slideshow. (Check your SUG’s rules regarding video if you are thinking of making videos at a
conference.)

Reproducible research. There is a large and growing reproducible research movement in academia, with special
emphasis on the fields of statistics and computer science. The goal of this movement is to make it easier for others to
reproduce the research described in a paper.

In the academic context, the most common form of reproducible research is for a paper to be done in R and LaTeX,
with the data, programs, paper, and supplementary files released in a package that is sometimes referred to as a
compendium; see Gandrud (2014) for further information on this approach. When done correctly, the code can be
easily rerun to generate the tables, figures, and other results shown in the paper. This also allows others to extend
and build on your research, in the tradition of scientific inquiry.

The Reproducible Research Standard (Stodden 2009A, 2009B) is proposed to encourage reproducible research.
The standard recommends that:

 articles should be released under a Creative Commons attribution license,


 source code should be released under a BSD license to allow others to reuse and modify your code.

Tools are available to package SAS code with LaTeX into compendiums for reproducible research; see Arnold &
Kuhfeld (2012) and Perttola (2008) for related information. LaTeX is relevant in academia, but the corporate world
makes little use of LaTeX and it seems reasonable to say that very few people outside academia are clamoring to

6
learn a new, complex, and specialized markup language like LaTeX. Clearly, reproducible research in non-academic
settings may evolve to use different tools from those used in academia.

The relevant question here is how will the reproducible research movement impact the non-academic SUG
community? User-written Base SAS programs that read text (including XML, JSON, or delimited/csv) files and
produce mostly html, text/csv output, or native SAS files are relatively portable and easy to reproduce.

In contrast to user-written Base SAS code, programs developed in many modern SAS metadata-based products exist
in metadata defined in the context of a specific and complex environment. These programs are not very portable and
can be difficult to migrate into new environments. Being produced by lengthy sequences of point and click, drag and
drop, such programs are widely regarded as non-reproducible.

To use the metadata from a SAS metadata-based product requires that you have the specific tool in-house and can
import the metadata into a compatible environment. The fact that the user needs an instance of the SAS metadata-
based product also makes programs created in these systems, non-reproducible.

Metadata-based SAS products can usually produce deployed SAS code, which for some products may be mostly but
not completely Base SAS and SAS/STAT code (the code mix will vary per the product being used). The first
constraint on generated code is that it can be voluminous and relatively complex. Manually making even minor
changes to the code can be difficult, laborious, and error-prone.

A second constraint on system-generated code is far more serious. Lines of exported/deployed code will usually fall
into one of 3 categories:
1. User-written hence copyright owned by the user or his/her employer,
2. Code generated by the product that includes parameters manually chosen (via gui) by the user of the
product – copyright on this code is unclear to this writer,
3. Boilerplate/support code generated by the product – macros, setup code, etc. – default assumption: this
code is copyright by SAS Institute, Inc., all rights reserved. (An example: %rcset macro in SAS Data
Integration Studio).
The constraint that you lack clear ownership of copyright for the entire generated code set means that you cannot
release/publish the code under a BSD license. (If your code contains only brief snippets of code in categories 2,3
above, you may be able to edit those out before release, or identify them in the release as not being covered by the
BSD license.) If you don’t have clear copyright on the code set, you should not release or publish it (though you can
release the user-written portions). This means that system-generated code produced by a metadata-driven SAS
product usually will not meet the reproducible research standard.

The bottom line here is that to increase reproducibility, SUG authors should –when it is feasible– publish full source
code and sample data sets so others can reproduce their results. If the code won’t fit within the page limits of the
SUG paper, it can be uploaded to sascommunity.org, github, or other file sharing service. To maximize
reproducibility, research should be done via user-written Base SAS and SAS/STAT code to the maximum extent
possible. SUG papers dealing with complex applications of metadata-based SAS products will probably not meet
reproducibility standards.

Epilogue

It is my hope that the view provided here of how to find a research topic and develop it into a completed SAS-related
research paper will help those who are considering writing papers for SUGs. I also hope that other SUG authors will
enhance the sharing of SAS-related information by using Creative Commons copyright licenses on their SUG papers,
support a transition away from pdf to epub and html formats, and write papers that are – where relevant and to the
extent possible – more reproducible.

Related web resources:

1. Creative Commons: http://creativecommons.org/


2. Open Source Initiative: http://www.opensource.org.
3. Internet Wayback machine at: https://archive.org/
4. WorldCat library search engine: http://www.worldcat.org/

7
References:
Note: All URLs cited or quoted here were accessed in Feb. 2014.
®
Arnold, Tim; Kuhfeld, Warren (2012). “Using SAS and LaTeX to Create Documents with Reproducible Results.”
Proceedings of SAS Global Forum 2012. http://support.sas.com/resources/papers/proceedings12/324-2012.pdf
®
Billings, Thomas (2013). “Sharing SAS User-Related Information via Creative Commons Copyright Licenses.”
Proceedings of Western Users of SAS Software 2013. http://www.lexjansen.com/wuss/2013/38_Paper.pdf

Carroll, Nikki (2010). “Everything You Need to Know About Writing a Paper and Presenting it at WUSS.” Proceedings
of Western Users of SAS Software 2010. http://www.lexjansen.com/wuss/2010/TUT/3040_5_TUT-Carroll.pdf

Gandrud, Christopher (2014). Reproducible Research with R and RStudio. CRC Press, Boca Raton, Florida, USA.
Google Books preview: http://books.google.com/books?id=u-nuzKGvoZwC

Loren, Judy (1994). “So You Want to Present a Paper at SUGI.” Proceedings of SAS Users Group International 19
(1994). http://www.sascommunity.org/sugi/SUGI94/Sugi-94-220%20Loren.pdf

Perttola, Juha-Pekka (2008). “SAS and LaTeX - A perfect match?” Proceedings of PhUSE 2008.
http://www.lexjansen.com/phuse/2008/ts/ts06.pdf

Rhodes, Dianne Louise (2004). “So You Want to Write a SUGI Paper? That Paper About Writing A Paper.”
Proceedings of SAS Users Group International 29 (2004). http://www2.sas.com/proceedings/sugi29/145-29.pdf 145-
29

Stodden, Victoria (2009A). “Enabling reproducible research: open licensing for scientific innovation.” International
Journal of Communications Law and Policy. 13, 1-25.
 Abstract URL: http://ijclp.net/old_website/article.php?doc=1&issue=13_2009
 Full text on author’s website: http://www.stanford.edu/~vcs/papers/Licensing08292008.pdf

Stodden, Victoria (2009B). “The legal framework for reproducible scientific research: licensing and copyright.”
Computing in Science and Engineering. 11, 35-40.
 Abstract URL: http://www.computer.org/csdl/mags/cs/2009/01/mcs2009010035-abs.html
 Full text on author’s website: http://www.stanford.edu/~vcs/papers/LFRSR12012008.pdf

Acknowledgment & Notes:


Thanks to the following for valuable comments and suggestions. Any errors herein are solely the responsibility of the
author.

 Yue Chen, MUFG Union Bank, San Francisco


 Ethan Miller, University of California, Berkeley

Contact Information:
Thomas E. Billings
MUFG Union Bank, N.A.
Basel II - Retail Credit BTMU
350 California St.; 9th floor
MC H-925
San Francisco, CA 94104

Phone: 415-273-2522
Email: tebillings@yahoo.com

SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS
Institute Inc. in the USA and other countries. ® indicates USA registration.

Other brand and product names are trademarks of their respective companies.

S-ar putea să vă placă și