Sunteți pe pagina 1din 109

XML Tools and Strategies

for Data Exchange:


Transcripts and More
AACRAO Technology
Conference
October 9-11 2005
Atlanta

SPEEDE Committee
Presenter
Bruce Marton
Assoc. Dir., Student Information
Systems, University of Texas
AACRAO SPEEDE Committee
Steering Committee, Postsecondary
Electronic Standards Council –
Standards Forum

SPEEDE Committee
Presenter
George Hudachek
Senior Associate Director of
Admissions, University of
Minnesota – Twin Cities
AACRAO SPEEDE Committee

SPEEDE Committee
Introduction
 Workshop structure
3 One hour segments
 2 breaks
 Executive summary
 Techie stuff

SPEEDE Committee
Topics Overview
 XML Schema Language
 PESC XML standards
framework
 Transport Technology and
Security
 Manipulating XML
 Case studies
SPEEDE Committee
 Issues and strategies
Section I
 Significance of XML Standards
 How did we get here?
 XML standards framework
 Overview of College Transcript
standard
 Developer tools

SPEEDE Committee
XML Basics
eXtensible Markup Language
 Internet document language

 Structured and unambiguous

 Highly flexible

SPEEDE Committee
<Name>
<FirstName>William</FirstName>
<MiddleName>Reed</MiddleName>
<LastName>Lang</LastName>
</Name>
<Contacts>
<Address>
<AddressLine>231 Hampton Forrest</AddressLine>
<City>Spring</City>
<StateProvinceCode>TX</StateProvinceCode>
<PostalCode>77289</PostalCode>
</Address>

SPEEDE Committee
XML Technology
Milestones
 Extensible markup language
(XML) 1.0 – February 1998
 structured information
 XHTML 1.0 - January 2000
 XML Schema 1.0 - May 2001
 Language to enforce
standards
SPEEDE Committee
EDI Format

SPEEDE Transcript (EDI)

SPEEDE Committee
XML Terms
 DTD: Master listing of all the elements including
where and how they need to be placed in the
documents

 Schema: An XML application that can describe


the allowed content of documents

 XSL, XSLT: converts an XML file into a another


specified format

 Parser: Tool that reads the document and


divides it into individual elements, attributes, and
other pieces

SPEEDE Committee  Validation: Process of checking structural


validity of document
<STUDENTID type = “SSN”>
123456789
</STUDENTID>
<DEMOGRAPHIC>
<BIRTH DATE type=“DATE”>
19740823
</BIRTH DATE>
<GENDER>

Sender
M
</GENDER>
</DEMOGRAPHIC>
Receiver
Application <GRADE_REPORT>
<SESSION Code=”!199901”>
<LABEL>
Generates SPRING SESSION

XML
</LABEL>
<YEAR type=”CCYY”> XML
1999
</YEAR>
<COURSE index=”1”>
Parser
<CREDIT type=”hours>
4 checks
</CREDIT>
<GRADE>
A
document
</GRADE>
<CODE>
SPN 406
</CODE>
<COURSE_TITLE>

SPANISH I
</COURSE_TITLE>
</COURSE>
<COURSE index=”2”>
<CREDIT type=”hours>
3
</CREDIT>
<GRADE>
B Receiver
</GRADE>
<CODE>
HIS 302
Application
</CODE>
<COURSE_TITLE> processes
SPEEDE Committee
TX HISTORY
</COURSE_TITLE>
file
</COURSE>
XML Forum
(now the Standards forum)
 Formed by Postsecondary Electronic
Standards Council in May 2000
 XML Standards for Education
 Use existing specification whenever
possible
 Utilize what works in EDI, learn from
what didn’t
 Establish work groups on topics
determined by Forum participants
 Core Components Workgroup
 Technology Workgroup
 Architecture Team
SPEEDE Committee
PESC Standards Forum
Community Participation
 AACRAO SPEEDE Committee
 NCHELP Electronic Standards
Committee
 Department of Education
 SIS Vendors
 Colleges

SPEEDE Committee
XML Schema Features

 Minimum/Maximum Length
 Required/Optional
 Repeatable
 Format: Number, Date etc.
 Code lists
 Mandatory choice
 User-defined extensions
SPEEDE Committee
XML Schema
Extensibility
 Extensions
 Substitutions
 References to other Schemas
 Take all or part
 Modify as needed

SPEEDE Committee
Extensibility through
Namespaces
 Identifies the source of an XML
definition set
 Use URI or URN
 http://www.w3.org/2001/XMLSchema
 urn:org:pesc:core:CoreMain:v1.0.0
 Distinguishes different objects with
the same name
 Allows schema referencing
 Use import feature
SPEEDE Committee
Core Components
 Data dictionary
 Core Components XML
Schema
 Data Formats
 Business processes
 Data Relationships
 Naming conventions

SPEEDE Committee
Transcript/Financial Aid
interoperability
 Common rendering of core
re-used elements
 Common code sets for
shared concepts
 Consistent technology
standards and best practices

SPEEDE Committee
Development Standards
 Modularity and re-usability
 form, fit and function
 Naming conventions
 no abbreviations
 some acronyms ok
 consistent naming
 “Code”, “Indicator”, “ID”
 UpperCamelCase
SPEEDE Committee
Development Standards
(cont.)
 Use of qualifiers
 Generally deprecated
 Specified instance of named
types preferred
 where semantically
significant
 base named types where
specificity not needed
 Object-oriented design
SPEEDE Committee
 Extensions from base types
Core Components
Common Data
Formats,Relationships
Processes

Student Loan
Schema
Transcript
Common
Schema
Application

SPEEDE Committee
Domains
Standards Forum Core
Components
Core
Sector Components Sector
Library Data Dictionary Library

Application Application Application


Schema Schema Schema
Sector
Library

Application Application Application


SPEEDE Committee Schema Schema Schema
Core Components
 All simple types in core
 Re-used complex types in
core
 “model” complex types
 Serves as building
components for new
document types

SPEEDE Committee
Sector and Instance
 Sector components
 Re-usable business
concepts
 Refining for specific needs
 Instance schemas
 Single document
 Transaction specific

SPEEDE Committee
How to reconcile
differences
 Agree to disagree
 Definitions can be refined in
sector
 Complex types may be re-
built
 Convergence
 Move to core definitions
 Unique constructs allowed
SPEEDE Committee
Standards Forum Core
Components
PESC Core
Components
Admissions Registrar
Financial Aid Data Dictionary
Sector
Sector CoreMain.xsd Library
Library
Admissions
FFELAlternative.xsd
Registrar.xsd

CommonLine
Request College Transcript
CommonLine
.xsd Schema
Receipt.xsd
CommonLine CollegeTranscript.xsd
CommonLine
Response.xsd
Roster.xsd
SPEEDE Committee CommonLine
RosterResponse.xsd
Standards Forum Objects –
proposed
TranscriptResponse
Schema
PESC Core
Academic Record TranscriptResponse.xsd
Components
Sector Library
Data Dictionary
AcademicRecord.xsd
CoreMain.xsd TranscriptRequest
Schema
TranscriptRequest.xsd

College Transcript High School Transcript


Schema Schema
CollegeTranscript.xsd HighSchoolTranscript.xsd
SPEEDE Committee
User-defined Extensions
 Allows mutually defined sub-
schemas
 Statesystems
 Regional requirements
 Software specific needs
 Allows strict standards within
system
 Other user can ignore.
SPEEDE Committee
 National ratification not needed
XML Format

SPEEDE Transcript (XML)

SPEEDE Committee
<Course>
<CourseCreditBasis>Regular</CourseCreditBasis>
<CourseCreditUnits>Semester </CourseCreditUnits>
<CourseCreditLevel>LowerDiv</CourseCreditLevel>
<CourseCreditValue>4</CourseCreditValue>
<CourseCreditEarned>4</CourseCreditEarned>
<CourseAcademicGradeScaleCode>25</CourseAcademicGradeScaleC
ode>
<CourseAcademicGrade>A</CourseAcademicGrade>
<CourseQualityPointsEarned>16</CourseQualityPointsEarned>
<CourseLevel>LowerDiv</CourseLevel>
<CourseSubjectAbbreviation>MATH</CourseSubjectAbbreviation>
<CourseNumber>2415</CourseNumber>
<CourseTitle>Calculus III</CourseTitle>
<Attribute>
<RAPCode>9TX</RAPCode>
<RAPName>TXCORECURR</RAPName>
<RAPSubName>020Mathematics - C1</RAPSubName>
</Attribute>
</Course>
SPEEDE Committee
Get the Schema
 PESC – College transcripts
and more
 http://www.pesc.org/info/appr
oved-standards.asp link
 SEVIS – International
Exchange visitors
 http://www.ice.gov/graphics/s
evis/schools/sevis.htm
SPEEDE Committee
More Schemas
 COD - Direct student aid
 http://ifap.ed.gov/IFAPWebAp
p/currentCODPag.jsp
 NCHELP – Guaranteed
student aid
 http://www.nchelp.org

SPEEDE Committee
Basic XML Tools
 XML Spy
http://www.altova.com/
 Turbo XML –TIBCO
http://www.tibco.com/
 Stylus Studio
http://www.stylusstudio.com/
 <oXygen/>
http://www.oxygenxml.com/
SPEEDE Committee
XML IDEs
 Design or view schemas
 Create instance documents
 Validate documents against
schema
 Create and test XSLT
stylesheets

SPEEDE Committee
XML Spy – special notes
 Several products but IDE is
probably all you’ll need
 XML Spy Home edition
 Verynice, free
 Does not support import
 Demo IDE link

SPEEDE Committee
Questions?

SPEEDE Committee
Section II
 XML Exchange model
 Transport options
 Security issue and solutions
 Software tools
 XML generation
 XML extraction

SPEEDE Committee
<STUDENTID type = “SSN”>
123456789
</STUDENTID>
<DEMOGRAPHIC>
<BIRTH DATE type=“DATE”>
19740823
</BIRTH DATE>
<GENDER>

Sender
M
</GENDER>
</DEMOGRAPHIC>
Receiver
Application <GRADE_REPORT>
<SESSION Code=”!199901”>
<LABEL>
Generates SPRING SESSION

XML
</LABEL>
<YEAR type=”CCYY”> XML
1999
</YEAR>
<COURSE index=”1”>
Parser
<CREDIT type=”hours>
4 checks
</CREDIT>
<GRADE>
A
document
</GRADE>
<CODE>
SPN 406
</CODE>
<COURSE_TITLE>

SPANISH I
</COURSE_TITLE>
</COURSE>
<COURSE index=”2”>
<CREDIT type=”hours>
3
</CREDIT>
<GRADE>
B Receiver
</GRADE>
<CODE>
HIS 302
Application
</CODE>
<COURSE_TITLE> processes
SPEEDE Committee
TX HISTORY
</COURSE_TITLE>
file
</COURSE>
Exchange model
 Generate message
 Deliver message
 Interpret message

SPEEDE Committee
XML Transport
 Delivery
 Security
 Routing
 Receipt Confirmation

SPEEDE Committee
Getting it there:
Transport Options
 Private networks (VANs)
 Internet
 Email
 FTP
 http

SPEEDE Committee
Email
 SMTP-simple mail transfer
protocol
 POP3 – post office protocol 3
 MIME - Multipurpose Internet
Mail Extension
 S/MIME – secure MIME
 Supported by most email
clients
SPEEDE Committee
FTP
 FTP – File Transfer Protocol
 Allows file PUT and GET
across networks
 On nearly every operating
system
 Useful in Batch
 Secure FTP
SPEEDE Committee
Web Transport
Technologies
 http – Hypertext Transfer
Protocol
 https – Hypertext Transfer
Protocol - Secure
 SSL – Secure Socket Layer
 encryption
 digital certificates

SPEEDE Committee
Security
 Privacy Laws
 FERPA
 Gramm-Leach-Bliley
 HIPPAA
 SSN

SPEEDE Committee
Security Risks
 Identity theft
 Financial data/access
 High profile students
 Opposition research
 Fraud prevention
 Thwarting bogus credentials
 Major argument for electronic
exchange
SPEEDE Committee
Security Requirements
 Encryption
 Authentication
 Non-repudiation

SPEEDE Committee
Encryption
 PGP
 SSL, SSH
 Secure FTP
 SFTP – SSH file transfer
protocol
 FTPS – standard FTP
wrapped with SSL
 https – http over SSL
SPEEDE Committee
PGP
 PGP – pretty good privacy
 Open source standard
 Free to educational users
 Sort of
 PKI – public key infrastructure
 Requires generating
public/private key pairs
 Exchange public key with
each trading partner
SPEEDE Committee
SSL
 RSA standard
 Negotiated encryption
algorithm
 Server authentication using
digital certificates
 most often one-way
 can be two-way

SPEEDE Committee
Secure FTP
 Negotiated encryption
algorithm
 Does not use
 Public keys
 Digital certificates
 Generally easy

SPEEDE Committee
Authentication
 E-signature
 Legal consideration
 Supports non-repudiation,
limited liability
 Digital Signature (PKI)
 Digital Certificate
 Trusted third-party
 Verisign dominant issuer
SPEEDE Committee
Email Encryption
 Requires public and private
keys
 encrypt, decrypt,
 digitally sign, authenticate
 Use PGP

SPEEDE Committee
FTP Encryption
 Requires public and private
keys
 encrypt, decrypt,
 digitally sign, authenticate
 Use PGP

SPEEDE Committee
Routing
 Any to Any
 Hub and spoke
 Web Services

SPEEDE Committee
Any to Any
 Can use variety of transport
 Email, ftp, http
 Variable encryption standards
 Requires table of partner’s
keys, address, other
requirements

SPEEDE Committee
Hub and Spoke design

SPEEDE Committee
Hub and Spoke
 Central Authority/Trading
partner
 Departmentof Education
 Homeland Security
 Store and forward
 State/Province authorities
 Florida, Ontario, California
 Texas Internet Server
SPEEDE Committee
Web Services
 web services vs. “Web
Services”
 Interoperability
 Real-time exchange
 Requirements
 XML
 https

SPEEDE Committee  WSDL


Message packaging
 How to wrap the payload
 Need to indicate
Message type
Protocol
Expected response
 Problem
Multiple documents can’t
be parsed
SPEEDE Committee
Multi-document
packaging options
 Create a single large XML
document
 MIME attachments
 zip
 mget and mput with FTP
 SOAP

SPEEDE Committee
Message Technologies
 SOAP – Simple Object Access
Protocol
 XML based
 Interoperability between
disparate platforms
 RPC vs. Messaging
 SOAP with MIME attachments

SPEEDE Committee
Soap Example
<soap:Envelope
xmls:soap=‘http://schemas.xmlsoap.org/soap/
envelope/’
soap:encodingStype=‘http://schemas.xmlsoap.org/
soap/encoding/’
<soap:Header>

<!-- extensions and headers go here -->

</soap:Header>
<soap:Body>

<!-- message payload goes here -->

</soap:Body>
</soap:Envelope>
SPEEDE Committee
Generating XML
 Determine format standards
 DTD
 XML Schema
 Database conversion
 Object based transformation
 Text generation (‘brute force
method’)
SPEEDE Committee
Working with XML
Schema
 Learn to read Schema
 Use a helper tool
 IDE
 Create sample documents
 XML Spy link
 PESC Implementation Guide
link
SPEEDE Committee
Programming
Languages
 C++, Java
 Both have extensive XML
applications
 Use a little or a lot
 .NET
 Requires commitment to
toolset

SPEEDE Committee
Software tools
 Open, interoperable
standards-based solutions
 Cover Pages
 http://xml.coverpages.org/
 Comprehensive (very!)
 SourceForge
 http://sourceforge.net/
 Java centric
SPEEDE Committee
Java tools
 Java 2
 Java Runtime Environment
 Java XML Pack (JAXP)
 Java XML binding (JAXB)
 http://java.sun.com

SPEEDE Committee
Open source Java tools
 http://xml.apache.org
 XERCES Java parser
 Uses Java API for XML
(JAXP)
 Also supports DOM
 Available in C++ version
 Xalan java XSLT tool
 SAXON XSLT parser
SPEEDE Committee
C++ tools / Win32
 MSXML
 MSXSL – XSLT application
 Visual C++
 www://msn.microsoft.com
 Xerces C++ Apache

SPEEDE Committee
Transmission Tool
 cURL

SPEEDE Committee
Extracting XML data
 Parsing
 SAX
 DOM
 Data binding
 Java Architecture for XML
Binding (JAXB)
 .NET (Microsoft)

SPEEDE Committee
SAX and DOM
 Streaming Api for Xml
 Very fast
 One and only one linear pass through the
doc
 Uses callback methods to handle events
 DOM
 Flexible; can walk the tree up and down
 Memory intensive
 Method calls like getParent(),
getChildNodes()

SPEEDE Committee
Data Binding Toolkits
 Take XML Schemas or XML
instances and parse into objects
used by the language
 Usually create arrays for recurring
nodes
 Objects and method names match
the tag names
 getSchool()
 addAddress()
SPEEDE Committee
Reading/parsing XML
 Parsing
 JAXP/XERCES
 SAXON
 MSXML (Windows, .NET)
 XSLTProc (Linux,Mac OSX)
 Validation against Schema
 Loading into database
SPEEDE Committee
To Validate or not?
 Overhead
 Back-end processing still
needed

SPEEDE Committee
Manipulating XML
 XSL – eXtensible Stylesheet
Language
 XSLT – XSL Transformations
 X-Path

SPEEDE Committee
XSLT
 Treats XML documents as
tree structure
 Converts source tree into
result tree using stylesheet
 Uses XSLT processor
 X-Path - W3C standard
 Almost a programming
language
SPEEDE Committee
Manipulating XML
 X-Path
 Scripting language for transforming and
manipulating nodes in an XML document tree
 Path to a node . . . /invoice/item/quantity . . .
Together with some relational, string, boolean and
arithmetic operators
 Used heavily in XSL/XSLT
 Basis for Data Object Model (DOM)
implementations

SPEEDE Committee
XSLT Tools
 Many to choose from
 XALAN
 Uses XERCES parser
 Supports C++
http://xml.apache.org/xalan-c/
 Java http://xml.apache.org/xalan-j/
 SAXON
 Java only
 http://saxon.sourceforge.net/
SPEEDE Committee
XSLT Example
<xsl:transform
xmlns:v1=‘urn:employee:v1’
xmlns:v2=‘urn:employee:v2’
xmlns:xsl=‘http://www.w3.org/1999/XSL/Transform’
version=‘1.0’>

<!-- override template for text/attributes -->


<xsl:templat match=“text()|@*”/>

<!-- template for position elements -->


<xsl:template match=“position”>
<title><xsl:value-of select=‘.’/></title>
</xsl:template>

<!– template for fname elements -->


<xsl:template match=“fname”>
<name><xsl:value-of select=“concat(., ‘ ‘,
following-sibling::lname)”/></name>
</xsl:template>

<!– template for v1:emp elements -->


<xsl:template match=“v1:emp”>
<v1:employee><xsl:apply-templates selct=“*”/>
</v1:employee>
</xsl:template>
</xsl:transform>

SPEEDE Committee
Questions?

SPEEDE Committee
Section III
 Case studies
 Development issues

SPEEDE Committee
Case Studies
 University of Texas
SEVIS/COD
 University System of Georgia
 CCC Transcript
 Texas Internet Server

SPEEDE Committee
U.T. Sevis
 International student tracking
 Formerly CIPRIS
 Mandatory
 INS/Department of Homeland
Security
 XML required

SPEEDE Committee
Sevis Challenges
 Accelerated timeline
 Limited vendor support at first
 Batch considered only option
 Required SSL
 Digital certificates
 Chose to use well-understood
technology
SPEEDE Committee
Technology employed -
upload
 Mainframe IBM OS/390
 Natural/ADABAS
 Used to build flat file dataset
 FTP to local PC over network
 cURL – Win32
 SupportsFTP, FTPS, HTTP,
HTTPS, HTTPS certificates,

SPEEDE Committee
Send Process

OS/390 Sevis
Secure
Server
network https
put

Windows
PC
Automated daily
batch at 6:00 pm
SPEEDE Committee
Technology employed -
download
 cURL – https get
 SAXON for Java
 Parses XML
 XSLT to flat file
 FTP get from Mainframe to
PC

SPEEDE Committee
Receive Process

OS/390 Sevis
Secure
Server
network https
get

Windows
PC
Automated daily
batch at 8:00 am
SPEEDE Committee
cURL code
curl -v -E certificate.pem:******** -F
programnumber=P-1-00739 -F
batchid=2004
0000000873 -F
xml=@d:\sevis\out\svtr0873.xml
https://egov.immigration.gov/sevisb
atch/action/batchUpload -k

SPEEDE Committee
cURL and Win32 code
curl -E certificate.pem:******** -o
d:\sevis\in\down.zip -F
programnumber=P-1-0
0739 -F batchid=20040000000873
https://egove.immigration.gov/sevis
batch/acttion/batchDownload -k
copy d:\sevis\in\down.zip
d:\sevis\backup\down%random%.zi
p
unzip32 d:\sevis\in\down.zip -x
*.pdf -d d:\sevis\in
unzip32 d:\sevis\in\down.zip -x
*.xml -d
SPEEDE Committee
Java code
d:\sevis\in
unzip32 d:\sevis\in\down.zip -x
*.xml -d d:\sevis\pdffiles
del d:\sevis\in\*.zip
ren d:\sevis\in\*.xml tranlog.xml
java -jar d:\saxon\saxon.jar -o
d:\sevis\in\down.txt
d:\sevis\in\tranlog.xml d:\
sevis\bin\sevis.xsl
type d:\sevis\in\tranlog.xml >>
d:\sevis\in\batchlog.txt
del d:\sevis\in\tranlog.xml
SPEEDE Committee
Lessons learned,
observations
 Scale issues with one-off
solution
 Vulnerable to specification
changes
 Was stable for 2 years
 Changed 3 months ago
 Best suited for central
authority hub/spoke
SPEEDE Committee
UT COD solution
 Similar to Sevis
 Uses EdConnect proprietary
software
 Secure FTP –
username/password
 Not as automated

SPEEDE Committee
University System of
Georgia
 Sponsored by Ga. Board of
Regents
 Vendor solution
 SungardSCT Banner 6.2
 34 colleges – 32 use Banner
already
 Uses PESCXML
CollegeTranscript schema
SPEEDE Committee
Any to any model
 Central Authority
 No central hub
 Supports email, FTP and http
 Must maintain user table of
transmission protocol and
encryption keys

SPEEDE Committee
Non-hierachical network

SPEEDE Committee
Texas Internet Server
 Supports larger scale
 Sending
 One packet – many
destinations
 Receiving
 One packet - many senders
 3rd Party trust model
 Maintain PGP key ring
SPEEDE Committee
XML Enhancements
 XML Transport and Routing
 Registration of XML capability
by Institution
 Increasing XML support
 Supporting XML standards

SPEEDE Committee
First Step: XML Transport

School EDI EDI School


Austin
Server

School XML XML School

SPEEDE Committee
Next: EDI/XML Translator

XML Translator EDI

SPEEDE Committee
Hub Based Translation

Austin
School EDI Server XML School

Translator

SPEEDE Committee
Concerns
 Community consensus needed
 Agreement to UT role
 Need for standard mapping
 Interpretation
 SPEEDE/PESC role

SPEEDE Committee
Receiver Based
Translator
Translation

Translator
School EDI XML School
Austin
Server
Translator

Translator
School XML EDI School

SPEEDE Committee
Hybrid model
 Some schools register to use
server translation
 Others schools receive in
native format, translate on
their own

SPEEDE Committee
Future Directions
 EDI to XML conversion
 Strategies
Convert now
XML vs. EDI

SPEEDE Committee
Questions?

SPEEDE Committee

S-ar putea să vă placă și