Sunteți pe pagina 1din 12

Talend Open Studio

for Big Data


Release Notes

5.4.1

Talend Open Studio for Big Data

Publication date December 12, 2013

Copyleft
This documentation is provided under the terms of the Creative Commons Public License (CCPL).
For more information about what you can and cannot do with this documentation in accordance with the CCPL,
please read: http://creativecommons.org/licenses/by-nc-sa/2.0/

Notices
All brands, product names, company names, trademarks and service marks are the properties of their respective
owners.

Table of Contents
System Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Big Data: New Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1. Kerberos security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2. Upgraded support for Hadoop
distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
3. Hadoop file formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
4. File management in HDFS . . . . . . . . . . . . . . . . . . . . . 2
5. NoSQL databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
6. In-memory technology . . . . . . . . . . . . . . . . . . . . . . . . . 3
7. Cloud technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
8. Demo project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
9. Other features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Big Data: Bug Fixes / Change Log . . . . . . . . . . . . . . . . . . . . . . . 4
1. Bug Fixes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Big Data: Known Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1. Studio multi-instance starting issue . . . . . . . . . . . . 5
2. Note for the developers of custom
components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Big Data: Hints and Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1. Installing required third-party licences . . . . . . . . . 7
Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1. Talend Help Center . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2. Revised documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3. Known issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4. Open issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

System Requirements

System Requirements
Users should refer to the Installation and Upgrade Guides on the Talend Help Center (http://help.talend.com) for
more information on Installation and System Requirements.

Big Data: New Features

Big Data: New Features


1. Kerberos security
1. The Kerberos kinit authentication mode has been enabled for all the Big Data components, including the Hive
components.
2. Except to the HBase ones, the Kerberos keytab authentication mode has been added to all the Big Data
components.

2. Upgraded support for Hadoop distributions


1. New versions of the following Hadoop distributions are supported:
Hortonworks Data Platform 1.3 and 2.0
Cloudera 4.3 and 4.4
MapR 2.1.3 and 3.0.1
2. EMC Pivotal is now available.

3. Hadoop file formats


Support for Sequencefile, RC, ORC and Avro has been added to several components:
1. The tHiveCreateTable and the tHiveLoad components are created. They support not only a wide range of
commonly used file formats such as Sequencefile, RC, ORC and Avro, but also the formats that are not officially
supported by Talend.
2. In addition to their existing functions, tPigLoad and tPigStoreResult can now process a Sequencefile, RC or
Avro file.

4. File management in HDFS


1. The tSqoopMerge component has been created for merging two datasets with newer records overwirting the
older ones..
2. Upgrade of HDFS components
The tHDFSCopy component can now merge the part files generated at the end of a MapReduce computation.
The input and the output components are enabled to handle header rows.
The tHDFSInput component can read sub-directories of a specified directory.

Big Data: New Features

5. NoSQL databases
1. The following components have been created to enable transactions with their related NoSQL databases:
tCassandraBulkLoad, tCassandraOutputBulk, tCassandraBulkExec and tCassandraOutputBulkExec
tMongoDBBulkLoad
The Riak components
2. The 2.4 and the 2.5 versions of MongoDB are now supported by its related components.

6. In-memory technology
1. The newly added SAP Hana components help users easily configure the connection to a SAP Hana system and
process transactions with this in-memory computing platform.

7. Cloud technology
1. With the addition of support for Amazon S3 (Simple Storage Service), users can use dedicated components to
perform transactions with this data storage service.
2. GS (Google Storage) components are now available for users to perform interactions with Google Storage and
prepare their data before transferring the data to Google BigQuery.

8. Demo project
1. A Big Data demo project is provided with the Studio. The project includes a number of easy-to-use sample
Jobs to help familiarize users with the various features and functions of Talend Studio with Big Data.

9. Other features
1. Support for OAuth2 security has been added to the Salesforce components.
2. With the addition of support for Amazon S3 (Simple Storage Service), users can use dedicated components to
perform transactions with this data storage service.
3. The Vertica components now officially support Vertica 5.1 and Vertica 6.0.

Big Data: Bug Fixes / Change Log

Big Data: Bug Fixes / Change Log


1. Bug Fixes
In addition to the above new features a number of minor improvements within the entire product and significant
bug fixes have been made.
See the corresponding Change Log on our bug tracking system for more details on the individual issues:
https://jira.talendforge.org/secure/ReleaseNote.jspa?projectId=10237&version=15215.

Big Data: Known Issues

Big Data: Known Issues


We encourage you to consult the JIRA bug tracking tool for a full list of open issues:
https://jira.talendforge.org/secure/IssueNavigator.jspa?requestId=16599
Note that this list shows issues from both Talend's Community and Subscription products.

1. Studio multi-instance starting issue


If you are using the open source version of the Studio and have tried to launch it twice or even more at the same
time, the Studio might not be able to restart any more after you close all of its instances.

2. Note for the developers of custom


components
A new finally component template such as tFileOutputDelimited_finally.javajet has been created for processing
the finally block. This change might provoke code compilation errors of a custom component when this component
has been migrated to 5.4.1 and is used there to process multiple outputs.
Issue diagnostic:
A custom components subject to this issue is typically developed with either of the following practices:
1. This custom component is written to open a try block in the begin part and close it in the end part.
2. This custom component is based on a duplicate of any of the following components released between 4.2.3
(exclusive) and 5.4.1 (exclusive).
tFileOutputDelimited
tSAPOutput
tBigQueryOutputBulk, tCassandraOutput, tHBaseOutput, tMongoDBOutput, tMongoDBWriteConf,
tNeo4jOutput, tNeo4jOutputRelationship, tNeo4jRow, tRiakOutput
tAccessOutputBulk,
tBonitaInstantiateProcess,
tGreenplumOutputBulk,
tInformixOutputBulk,
tIngresOutputBulk,
tMSSqlOutputBulk,
tMomOutput,
tMysqlOutputBulk,
tOracleBulkExec,
tOracleOutputBulk, tParAccelOutputBulk, tPivotToColumnsDelimited, tPostgresPlusOutputBulk,
tPostgresqlOutputBulk, tSalesforceOutputBulk, tSybaseOutputBulk, tVerticaOutputBulk
tGenKeyHadoopIn, tGenKeyHadoopOut, tMatchGroupHadoopIn, tMatchGroupHadoopOut
tCollector, tDepartitioner, tPartitioner, tRecollector
Recommended solution:
1.

Remove any try, catch or finally blocks from your begin and end parts.

2.

Put any resources that you will need to use in your finally code in the new resourceMap variable. For example,
resourceMap.put("resources_tFileOutputDelimited_1",object);

Big Data: Known Issues

3.

Create a finally code template which will then be able to use objects from the resourceMap variable and
close connections.

The following links present a complete example for implementing this solution:
1.

Modification of begin.javajet:
http://talendforge.org/trac/tos/changeset/111049#file13.

2.

Modification of end.javajet:
http://talendforge.org/trac/tos/changeset/111049#file14.

3.

Addition of a finally part:


http://talendforge.org/trac/tos/browser/trunk/org.talend.designer.components.localprovider/components/
tSAPOutput/tSAPOutput_finally.javajet?rev=111049.

Big Data: Hints and Notes

Big Data: Hints and Notes


1. Installing required third-party licences
Users must install certain required third-party libraries for all Talend products to work correctly. These libraries
can be installed via the Modules View.

Documentation

Documentation
1. Talend Help Center
Find out more about how to get the most out of your Talend products on the Talend Help Center: http://
help.talend.com.
New articles for this release include:
A Knowledge Base article providing a full list of the different Map/Reduce components: https://help.talend.com/
pages/viewpage.action?pageId=22525540

2. Revised documents
In addition to updates to the content across the documentation set, the following specific documentation changes
have been made.
Talend Open Studio for MDM User Guide now includes parts describing how to work with the Integration and
Profiling perspectives, as well as the MDM perspective. This guide merges the information contained in the
Talend Open Studio for Data Integration User Guide and the Talend Open Studio for Data Quality User Guide
with the previous standalone Talend Open Studio for MDM User Guide.
Talend Big Data Studio Getting Started Guide has been renamed to Talend Big Data Getting Started Guide.
A new chapter "Getting started with Talend Big Data using the demo project" has been added to the Talend Big
Data Studio Getting Started Guide. This chapter provides short descriptions about the sample Jobs included in
the demo project and introduces the necessary preparations to run the sample Jobs on a Hadoop platform.
Talend Open Studio for ESB Mediation Components Reference Guide and Talend ESB Mediation Components
Reference Guide have been merged into one guide, Talend ESB Mediation Components Reference Guide.
In the ESB Getting Started Guide, the chapter "Downloading and installing Talend ESB software" is now called
"Getting started with Talend ESB", and the demo chapters are now split into two categories ("Basic deployment
and runtime use cases" and "Advanced deployment and runtime use cases with SOA Governance").
In the ESB Infrastructure Services Configuration Guide and the STS User Guide, some conceptual information
has been added that was previously found in the ESB Getting Started Guide.

3. Known issues
In the Talend ESB Mediation Components Reference Guide, the documentation for the cMap component does not
specify that this component is only available with Talend Platform products.

4. Open issues
We encourage you to consult the JIRA bug tracking tool for a full list of open issues:

Documentation

https://jira.talendforge.org/secure/IssueNavigator.jspa?requestId=16604

S-ar putea să vă placă și