Documente Academic
Documente Profesional
Documente Cultură
TALEND STUDIO
January 2014
Deloitte
Page 1
TALEND STUDIO
US India IM Technology Updates
Table of Contents
1.
2.
3.
4.
5.
6.
1.2
1.3
2.2
Features/Benefits ....................................................................................................................... 5
2.3
2.4
2.5
Introduction................................................................................................................................ 12
3.2
3.3
Features ..................................................................................................................................... 13
3.4
Benefits ...................................................................................................................................... 14
Data Quality....................................................................................................................................... 17
4.1
Introduction................................................................................................................................ 17
4.2
Features ..................................................................................................................................... 17
4.3
4.4
Benefits ...................................................................................................................................... 19
4.5
Introduction................................................................................................................................ 22
5.2
5.3
Features ..................................................................................................................................... 23
5.4
Advantages ............................................................................................................................... 25
Conclusion ......................................................................................................................................... 26
Deloitte
Page 2
TALEND STUDIO
US India IM Technology Updates
Talend provides data, application and business process integration solutions that enable
organizations to effectively leverage all of their information assets. Talend unites integration
projects and technologies to accelerate the time-to-value for the business.
Talends flexible architecture easily adapts to future IT platforms, for big data environments.
Talends unified solutions include Data Integration, Data Quality, Master Data Management,
Enterprise Service Bus and Business Process Management. Talends platform is built around a
common set of easy-to-use tools implemented across all products to maximize the skills of
integration teams.
Talend offers a flexible open source based platform, unlike traditional vendors offering closed
and disjointed solutions, supported by a predictable and scalable value-based subscription
model.
Talend Open Studio for Data Integration can be leveraged by an organization for:
Page 3
TALEND STUDIO
US India IM Technology Updates
Talend also receives the voluntary help of thousands of community members for testing,
translation and improvements, resulting in a more rapid and more cost-effective product
development as well an innovation advantage over traditional alternatives.
Talends team members are contributors to key open source projects, and the company is a
sponsor or member of several open source foundations and consortiums, including the Apache
Software Foundation and the Eclipse Foundation.
Big Data
Data Integration
Data Quality
MDM
2. Big Data
2.1 How does Big Data fit into Talend Studio?
Talend provides a facile-to-use graphical environment that sanctions developers to visually map
big data sources and targets without the need to learn and write complicated code. Running
100% natively on Hadoop, Talend Big Data provides massive scalability. Once a big data
connection is configured the underlying code is automatically generated and can be deployed
remotely as a job that runs natively on your big data cluster - HDFS, Pig, HCatalog, HBase,
Sqoop or Hive.
Talend's big data components have been tested and certified to work with leading big data
Hadoop distributions, including Amazon EMR, Cloudera, IBM PureData, Hortonworks, MapR,
Pivotal Greenplum, Pivotal HD, and SAP HANA. Talend provides out-of-the-box support for
sizably voluminous data platforms from the leading appliance vendors including
Greenplum/Pivotal, Netezza, Teradata, and Vertica.
Talend provides two big data integration solutions to address all needs: Talend Open Studio for
Big Data is a free, open source development tool and Talend Platform for Big Data adds data
quality, advanced deployment and management functions across the enterprise.
Deloitte
Page 4
TALEND STUDIO
US India IM Technology Updates
2.2 Features/Benefits
1. Integrating Disparate Data Sources
One of the most prevalent business pressures driving Big Data investment is having an
extravagant quantity of data silos. In many companies, important business data is spread out
over inordinate quantity of locations, from databases to file stores to collaborative web portals to
multiple versions of enterprise applications like ERP or CRM systems. Big Data companies had
on average 20 unique internal data sources that stored data necessary for operations or
analysis.
Given the complexity inherent in accessing data from so many sources, an important piece of a
Big Data foundation is the ability to easily move data from one source to another. Three
quarters (74%) of Leaders in Big Data had ETL tools (extract, transform, load) to move their
data, and they were 1.6-times more likely than Followers to be able to integrate data in real
time.
Deloitte
Page 5
TALEND STUDIO
US India IM Technology Updates
Deloitte
Page 6
TALEND STUDIO
US India IM Technology Updates
Page 7
TALEND STUDIO
US India IM Technology Updates
around language types such as Perl and statistical analytical programming languages such as R
and ECL. Expertise in areas such as multi-tiered SQL constructions are replaced by a need to
understand efficient regex constructions and Levenshtein algorithms.
Deloitte
Page 8
TALEND STUDIO
US India IM Technology Updates
What do you see as the challenges to implementing big data in your organization?
Deloitte
Page 9
TALEND STUDIO
US India IM Technology Updates
Deloitte
Page 10
TALEND STUDIO
US India IM Technology Updates
Talend Enterprise Big Data extends the Talend Open Studio for Big Data product with
professional-grade technical support and enterprise-class features. An organization will upgrade
to this version to take advantage of advanced collaboration, monitoring and project
management features.
Talend Platform for Big Data
The Talend Platform for Big Data addresses the challenges of big data integration, data quality
and big data governance, simplifying the loading, extraction and processing of large and diverse
data sets so you can make more informed and timely decisions. Data quality components allow
you to do big data profiling, cleansing and matching using a massively parallel environment
such as Hadoop. Advanced clustering features allow you to integrate at any scale.
Delivered on top of the Talend unified platform, Talend Platform for Big Data improves
productivity across data management domains by sharing a common code repository and
tooling for scheduling, metadata management, data processing and service enablement.
Deloitte
Page 11
TALEND STUDIO
US India IM Technology Updates
3. Data Integration
3.1 Introduction
Product Talend Open Studio for Data Integration is an open source data integration product
developed by Talend and designed to combine, convert and update data in various locations
across a business.
It was launched in October 2006, under Talend Open Studio, its previous name and is
distributed under GPLv2. It had been downloaded over 1 million times in January 2008. The
product totaled 20 million downloads and over 3500 clients in January 2012.
Talend also provides Talend Enterprise Data Integration, a commercial extension to Talend
Open Studio for Data Integration with additional features, technical support and IP
indemnification.
The product provides an extensible, high efficiency, open source set of tools to access,
transform and integrate data from any business system in real time or batch to meet both
operational and analytical data integration needs. With 450+ connectors, it integrates almost
any data source. The broad range of use cases addressed include: massive scale integration
(big data/ NoSQL), ETL for business intelligence and data warehousing, data synchronization,
data migration, data sharing, and data services.
Deloitte
Page 12
TALEND STUDIO
US India IM Technology Updates
3.3 Features
1. A Comprehensive Solution
Talend provides a Business Modeler, a visual tool for designing business logic for an
application; a Job Designer, a visual tool for functional diagramming, delineating data
development and flow sequencing using components and connectors. It also provides a
Metadata Manager, for storing and managing all project metadata, including contextual data
such as database connection details and file paths.
2. Broad Connectivity to All Systems
Talend connects natively to the following databases: packaged applications (ERP, CRM, etc.),
SaaS and Cloud applications, mainframes, files, Web services, data warehouses, data marts,
and OLAP applications. It offers built-in advanced components for ETL including string
manipulators, Slowly Changing Dimensions, automatic lookup handling and bulk loading. Direct
integration is provided with data quality, data matching, MDM and related functions. Talend
connects to popular cloud apps including Salesforce.com and SugarCRM.
3. Teamwork and Collaboration
The shared repository consolidates all project information and enterprise metadata in a
centralized repository. This repository is shared by all stakeholders: business users, job
developers, and IT operations staff. Developers can easily version jobs with the ability to rollback to a prior version.
4. Advanced Management and Monitoring
Talend includes various features such as powerful testing, debugging, management and tuning
features with real-time tracking of data execution statistics and an advanced trace mode. The
product incorporates tools for managing the simplest jobs to the most complex ones, from single
jobs to thousands of jobs. Processes can be deployed across enterprise and grid systems as
data services using the export tool.
It is a free, open source development tool; Talend Enterprise Data Integration adds teamwork
and management functions; Talend Platform for Data Management adds data quality and
clustering. Below is the comparison of all three:
Deloitte
Page 13
TALEND STUDIO
US India IM Technology Updates
3.4 Benefits
1. Optimized Time and Cost
Talend's solutions are 50% to 80% cheaper than equivalent proprietary solutions offered in the
market. They are also less expensive to deploy, maintain, and support. In addition, they
facilitate faster development and production as compared to proprietary tools and hand-coding.
2. Functionality, Performance, Reliability
Far from the stereotypes concerning the lack of professionalism surrounding open Source,
Talends solutions are real business tools, offering a level of functionality equal to that of
proprietary vendors. And being open source doesnt mean that a solution was developed by
volunteers in their spare time. Talend has its own R&D teams and, as discussed above, its
solutions are enriched by contributions from the community. Although Talend controls the
product roadmap, the company continually listens to the opinion and needs of its community
and its customers to help effect change. Talend is committed to providing powerful, reliable
solutions that attract many users.
Deloitte
Page 14
TALEND STUDIO
US India IM Technology Updates
Page 15
TALEND STUDIO
US India IM Technology Updates
relationships between them, and setting their properties (most properties are inherited from the
metadata). Talend`s Business Modeler leverages a top-down approach, allowing line-ofbusiness stakeholders to get involved in the design of the integration processes and to monitor
development progress.
9. No Barrier to Adoption
Implementing Talend Open Studio is quick and easyjust download the latest version from
Talend`s website and install it.
The product is free, which means that you don`t need to justify it to management or start a
formal procurement process before solving your integration issues. You won`t spend any time
on administration tasks and wont need any vendor face time. You can use the product in an
unlimited mode and, of course, keep it as long as you like.
10. Market-Ready Products, Not Evaluation Versions
Talend Open Studio is a complete product comprising many features and a "wide range of
connectors". Talend's flagship product, it is the most open, innovative and powerful data
integration solution available today. Talend Open Studio isn`t a lightweight product or trial ware.
It contains all the features required for building powerful data integration processes, and is freely
downloadable and usable under the GPL v2 license.
Talend Integration Suite is an enhanced version of Talend Open Studio, providing additional,
enterprise-level functionality (collaboration, automated deployment, load balancing, monitoring)
for enterprise-grade projects. The solution includes high-level technical support to respond to
corporate issues and legal guarantees of intellectual property protection (IP indemnification).
Deloitte
Page 16
TALEND STUDIO
US India IM Technology Updates
4. Data Quality
4.1 Introduction
Talend Open Studio for Data Quality is an open source software that helps companies to
assess the quality of data contained in their databases and business applications, and to decide
which actions must be taken to correct erroneous or incomplete data.
Talend Open Studio for Data Quality was launched in June 2008, under its previous name:
Talend Open Profiler and is distributed under LGPL. Talend also provides Talend Enterprise
Data Quality, integrated in Talend Enterprise Data Integration, which is a commercial extension
to Talend Open Studio for Data Quality with additional features.
This data profiling tool allows business users to define a set of designators for each data
element that needs to be analyzed or monitored. It produces sophisticated reports and graphs
that let users analyze the level of quality of the data.
The following profiling needs are addressed by Talend Open Studio for Data Quality:
A. Metadata discovery, which identifies the structure of the databases that need to be
analyzed.
B. Statistics definition, which defines the statistics and metrics that need to be measured on
each data item.
C. Results and graphs, which make it easy to view the results and assess the level
of quality of the data.
4.2 Features
1. A Complete Solution
Talend provides a consummate data quality solution with built-in data connectivity, profiling,
cleansing, matching and monitoring to address all your data quality and data governance needs.
Data quality capabilities can be scaled to handle anything from flat text files to enterprise data to
Hadoop. Talend is able to leverage the best capabilities of the platform to provide data quality
seamlessly across many data types and over any data volume.
2. Data Profiling
Data profiling is all about understanding data completely, and making sure it conforms to
company and industry standards. With Talend, users can profile and analyze data, then create
and share web-based reports on the quality of the data. With this information you can build team
Deloitte
Page 17
TALEND STUDIO
US India IM Technology Updates
alignment on the use of data and highlight areas for improvement. Talend provides pre-defined
tests to ensure data quality is fit-for-use within your enterprise application, or you can define
your own.
3. Data Standardization and Enrichment
The secret behind Talends data standardization and enrichment capabilities are built-in data
integration and powerful parsing technology. Use the integrated parsing technology to assign
structure to data that has none. Then achieve data quality enhancement and enrichment by
using free reference data. Talend provides ways to integrate most external reference data
sources for postal validation, business identification and credit score information, to name just a
few.
4. Data Matching and Survivorship
Talend provides a variety of data matching solutions that moves the process of overly complex,
green-screen match rules editing to real-world business users. Users can configure matching
within the Talend user environment instead of heavy editing of rules files and using multiple
GUIs that are associated with most data quality tools. Create what-if analysis when modifying
matching techniques with charts and graphs for key matching metrics.
Talend incorporates data quality into several products including Talend Open Studio for Data
Quality and Talend Platform for Data Management.
Deloitte
Page 18
TALEND STUDIO
US India IM Technology Updates
4.4 Benefits
1. Improves performance and provides Big Data alternative for Data Quality.
2. Improves data quality, which directly impact business analysis and decision making within
an organization.
3. Get developers up to speed quickly and learn new techniques that can be applied directly
into real world projects.
4. Less testing time, more accuracy, improves the overall data quality and analysis.
5. Better data, means better execution, means more cost effectives - when it comes to
leveraging the CRM system.
Page 19
TALEND STUDIO
US India IM Technology Updates
Deloitte
Page 20
TALEND STUDIO
US India IM Technology Updates
Deloitte
Page 21
TALEND STUDIO
US India IM Technology Updates
Deloitte
Page 22
TALEND STUDIO
US India IM Technology Updates
5.3 Features
1. Master Any Domain
The scalability and flexibility to model and master any domain is provided by Talend. You can
start by mastering a single domain and incrementally increase this to include other domains all
within a single deployment. You can define advanced business rules, validations, access rights
and registry lookups directly on the model. The model sits at the center of the MDM solution and
drives all communications with closed loop quality, workflow and end to end integration
functions.
Deloitte
Page 23
TALEND STUDIO
US India IM Technology Updates
Deloitte
Page 24
TALEND STUDIO
US India IM Technology Updates
5.4 Advantages
1. The new UI allows users to easily find and navigate through master data and increases
adoption of master data applications.
2. Customization of the web UI improves adoption in the business audience for MDM. Custom
forms allow for a consistent look and feel for an organization.
3. Find master data in seconds. No other MDM solution provides this function.
4. Provide a business user with a single view of a master entity to gather complete insight.
5. Enable the business to create and manage large sets of master data.
6. Enable a team of developers across all MDM functional categories to share and collaborate
on an MDM project.
7. Manage simple and complex hierarchies in an easy to use and intuitive interface. The
Talend platform features a user-friendly interface specifically designed to promote sharing
and collaboration.
8. Talends open source business model gives OEM partners the best of both worlds.
Leverage the mindshare of more than 750,000 developers and all that they represent: over
500 connectors, including sophisticated technologies such as Hadoop, SAP, Salesforce.com
and more; rigorous quality assurance; useful, user-requested features, and user forums
teeming with expertise.
Deloitte
Page 25
TALEND STUDIO
US India IM Technology Updates
6. Conclusion
Talend has progressively built best of breed solutions for all integration needs while working on
a common unified platform. Products by Talend are a powerful and versatile open source
solutions for Big Data integration that addresses the needs of the data analyst by providing
them with a graphical tool that abstracts the underling complexities of big data technologies and
dramatically improve the efficiency of Job Design.
With data spread out across locations, from Databases to file store to multiple versions
of Enterprise Applications, it helps Integrating Disparate Data Sources.
Since it is an Open Source Solution, it helps processing data at High Speed which
reduces the cost significantly. Talend's solutions are 50% to 80% cheaper than
equivalent proprietary solutions offered on the market.
Improves data quality, which directly impacts business analysis and decision making
within an organization.
While we step in the Big Data Era, an open source solution with numerous advantages is bound
to succeed. It is very likely that the big data landscape will see more innovations before the
unanimity emerges on the right technology architecture.
Deloitte
Page 26