Sunteți pe pagina 1din 26

TALEND STUDIO

US India IM Technology Updates

TALEND STUDIO

January 2014

Author: US India IM Technology Updates


Deloitte Consulting LLP

Deloitte

Page 1

TALEND STUDIO
US India IM Technology Updates

Table of Contents
1.

2.

3.

4.

5.

6.

Introduction to Talend Studio............................................................................................................ 3


1.1

What is Talend Studio? ............................................................................................................. 3

1.2

The Open Source Approach ..................................................................................................... 3

1.3

Talend Open Studio and It`s Products .................................................................................... 4

Big Data ............................................................................................................................................... 4


2.1

How does Big Data fit into Talend Studio? ............................................................................. 4

2.2

Features/Benefits ....................................................................................................................... 5

2.3

Unique Challenges ..................................................................................................................... 7

2.4

Surveys on Big Data Benefits and Challenges ...................................................................... 8

2.5

Talend and Big Data ................................................................................................................ 10

Data Integration ................................................................................................................................ 12


3.1

Introduction................................................................................................................................ 12

3.2

Product Description .................................................................................................................. 12

3.3

Features ..................................................................................................................................... 13

3.4

Benefits ...................................................................................................................................... 14

Data Quality....................................................................................................................................... 17
4.1

Introduction................................................................................................................................ 17

4.2

Features ..................................................................................................................................... 17

4.3

Working Principles of Data Quality ........................................................................................ 19

4.4

Benefits ...................................................................................................................................... 19

4.5

Root Causes of Data Quality Problems ................................................................................ 19

Master Data Management (MDM) ................................................................................................. 22


5.1

Introduction................................................................................................................................ 22

5.2

Talend MDM Functional Architecture .................................................................................... 22

5.3

Features ..................................................................................................................................... 23

5.4

Advantages ............................................................................................................................... 25

Conclusion ......................................................................................................................................... 26

Deloitte

Page 2

TALEND STUDIO
US India IM Technology Updates

1. Introduction to Talend Studio


1.1 What is Talend Studio?

Talend provides data, application and business process integration solutions that enable
organizations to effectively leverage all of their information assets. Talend unites integration
projects and technologies to accelerate the time-to-value for the business.
Talends flexible architecture easily adapts to future IT platforms, for big data environments.
Talends unified solutions include Data Integration, Data Quality, Master Data Management,
Enterprise Service Bus and Business Process Management. Talends platform is built around a
common set of easy-to-use tools implemented across all products to maximize the skills of
integration teams.
Talend offers a flexible open source based platform, unlike traditional vendors offering closed
and disjointed solutions, supported by a predictable and scalable value-based subscription
model.
Talend Open Studio for Data Integration can be leveraged by an organization for:

synchronization or replication of databases


right-time or batch exchanges of data
ETL (Extract/Transform/Load) for analytics
data migration
complex data transformation and loading
data quality exercises
big data

1.2 The Open Source Approach


Fueled by an open source approach, and liberatingly-available downloads, Talends perpetually
expanding base of adopters fosters the expansion of a vigorous community that benefits users
of the open source versions and commercial customers alike.
By publishing the code of its core modules under the GNU Public License and the Apache
License, Talend offers to the community the flexibility to modify and extend source code to meet
their specific business needs. This enables them to engender their own components and
apportion them with the rest of the community, rendering the products more versatile and more
scalable for different uses and projects.
Deloitte

Page 3

TALEND STUDIO
US India IM Technology Updates

Talend also receives the voluntary help of thousands of community members for testing,
translation and improvements, resulting in a more rapid and more cost-effective product
development as well an innovation advantage over traditional alternatives.
Talends team members are contributors to key open source projects, and the company is a
sponsor or member of several open source foundations and consortiums, including the Apache
Software Foundation and the Eclipse Foundation.

1.3 Talend Open Studio and It`s Products


Talend Open Studio is a powerful and versatile set of open source products for developing,
testing, deploying and administrating data management and application integration projects.
Products which will be discussed in the coming sections are:
1.
2.
3.
4.

Big Data
Data Integration
Data Quality
MDM

2. Big Data
2.1 How does Big Data fit into Talend Studio?

Talend provides a facile-to-use graphical environment that sanctions developers to visually map
big data sources and targets without the need to learn and write complicated code. Running
100% natively on Hadoop, Talend Big Data provides massive scalability. Once a big data
connection is configured the underlying code is automatically generated and can be deployed
remotely as a job that runs natively on your big data cluster - HDFS, Pig, HCatalog, HBase,
Sqoop or Hive.
Talend's big data components have been tested and certified to work with leading big data
Hadoop distributions, including Amazon EMR, Cloudera, IBM PureData, Hortonworks, MapR,
Pivotal Greenplum, Pivotal HD, and SAP HANA. Talend provides out-of-the-box support for
sizably voluminous data platforms from the leading appliance vendors including
Greenplum/Pivotal, Netezza, Teradata, and Vertica.
Talend provides two big data integration solutions to address all needs: Talend Open Studio for
Big Data is a free, open source development tool and Talend Platform for Big Data adds data
quality, advanced deployment and management functions across the enterprise.
Deloitte

Page 4

TALEND STUDIO
US India IM Technology Updates

2.2 Features/Benefits
1. Integrating Disparate Data Sources
One of the most prevalent business pressures driving Big Data investment is having an
extravagant quantity of data silos. In many companies, important business data is spread out
over inordinate quantity of locations, from databases to file stores to collaborative web portals to
multiple versions of enterprise applications like ERP or CRM systems. Big Data companies had
on average 20 unique internal data sources that stored data necessary for operations or
analysis.
Given the complexity inherent in accessing data from so many sources, an important piece of a
Big Data foundation is the ability to easily move data from one source to another. Three
quarters (74%) of Leaders in Big Data had ETL tools (extract, transform, load) to move their
data, and they were 1.6-times more likely than Followers to be able to integrate data in real
time.

Deloitte

Page 5

TALEND STUDIO
US India IM Technology Updates

2. Processing Data at High Speed with In-memory


Not able to deliver information as quickly as business user needs it is the most common
business pressure faced. Efficiently delivering data and analysis on business events as they
occur, and making informed decisions on this information, are marks of an agile, data-driven
organization. The demand for data is growing, and business users are asking for it faster than
ever. Forty-seven percent (47%) of Big Data organizations need insight within an hour of a
business event occurring, and more than a third (35%) need it in near real time.
One of the most efficacious technologies for processing data at high speed is in-memory
computing. These solutions load the target data directly into the random access memory (RAM)
of a server or desktop, very close to the processer itself. This eliminates the need to connect to
a storage array or disk, locate the desired information, and convey it over a network to the
server doing the processing. Without these potential bottlenecks, the full power of the
processors can be directly used to access and manipulate the desired information.
There are several in-memory solutions aimed almost exclusively at immensely colossal global
enterprises. However, the basic concept behind the technology can translate from advanced
next-generation servers to machines as simple as a standard notebook. Given the steadily
decrementing cost of RAM, most commodity servers have dozens or hundreds of gigabytes of
memory available, opening the door for low-cost solutions.
3. Handling the Unstructured Data Problem
One of the major strengths of Big Data initiatives is the ability to collect, manage, and analyze
not just structured data from relational databases, but unstructured or semi-structured data also
from documents, emails, social media feeds, images, video, and rich media. A surprising
amount of business data resides in these unstructured formats.
There are a number of different database management systems being designed to leverage
unstructured information. On a broad level, most are referred to as NoSQL databases,
commonly interpreted as "not only structured query language". This departure from traditional
relational database management systems (RDBMS) gives more flexibility to the data formats
being stored, managed, and accessed, as well as accommodating as a paramount repository of
data to feed into analytic platforms.
One of the most exciting developments in the world of unstructured data analysis is Apache
Hadoop, an open-source file storage framework that couples the flexibility of managing
unstructured data with high-powered processing capabilities. Predicated on Google's
MapReduce programming model for breaking large, complex problems into small bite-sized
chunks, Hadoop can utilize the processing power of clusters of ordinary servers to tackle tasks
far larger than any single machine could accomplish. Given the fact that the code is open
source, and can run on commodity hardware, it offers a powerful, cost-effective Big Data option.

Deloitte

Page 6

TALEND STUDIO
US India IM Technology Updates

4. Comprehensive corporate data analysis


The benefits of taking the Big Data approach are not constrained to better and more
comprehensive corporate data analysis by taking in the totality of enterprise data, not just a
fraction of it. Timeliness of querying could also be radically improved. Complex queries that in
the past took hours to schedule and execute could in future be set up and run in seconds.
5. Decision making, planning and execution
There is additionally a veritude pay-off that can extend right across the enterprise. Just as big
data can enhance decision making at the corporate level by integrating crucial unstructured data
into BI and other enterprise data systems, it also promises to enhance decision making,
planning and execution at the departmental and workgroup levels.
6. Highlights errors and misinformation on the fly
At the local level, most project management, planning and delivery is based on unstructured
data files. And these files contain errors, questionable assumptions and misinformation. Big
data systems can highlight and spot-check such errors and misinformation on the fly, greatly
improving the timeliness and efficacy of local programs.

2.3 Unique Challenges


1. Limited Big Data Resources
The majority of architects and developers who understand big data are working for the original
creators of big data technologies; companies like Facebook, Google, and Yahoo to name a few.
There are others employed by numerous startups in this space like Hortonworks, Cloudera and
MapR. The technology is still a bit complex to learn which restricts the rate at which new big
data resources are available.
2. Poor Data Quality + Big Data = Big Problems
Bad data quality can have a big impact on effectiveness. Inconsistent or invalid data could have
an exponential impact on analysis in the big data world. As analysis on big data grows, so too
will the need for validation, standardization, enrichment and resolution of data. Even
identification of linkages can be considered a data quality issue that needs to be resolved for big
data.
3. Setting up the system
Setting up and running big data systems can pose significant skills and knowledge challenges
as big data presents a very different paradigm to conventional enterprise relational database
systems. Programming skills required for big data are also often quite different, centering
Deloitte

Page 7

TALEND STUDIO
US India IM Technology Updates

around language types such as Perl and statistical analytical programming languages such as R
and ECL. Expertise in areas such as multi-tiered SQL constructions are replaced by a need to
understand efficient regex constructions and Levenshtein algorithms.

2.4 Surveys on Big Data Benefits and Challenges


What would you identify as potentially the main benefits of integrating disparate data via
big data file systems?

Deloitte

Page 8

TALEND STUDIO
US India IM Technology Updates

What do you see as the challenges to implementing big data in your organization?

Deloitte

Page 9

TALEND STUDIO
US India IM Technology Updates

2.5 Talend and Big Data


Talends open source approach and flexible integration platform for big data enables users to
easily connect and analyze data from different systems to help drive and improve business
performance. Talends big data capabilities integrate with todays big data market leaders such
as Cloudera, Hortonworks, Google, EMC/Greenplum, MapR, Netezza, Teradata and Vertica,
positioning Talend as a leader in the management of big data. Talends goal is to democratize
the big data market just as it has with data integration, data quality, master data management,
application integration and business process management.
Talend offers three big data products:
1. Talend Open Studio for Big Data
2. Talend Enterprise Big Data
3. Talend Platform for Big Data
Talend Open Studio for Big Data
Talend Open Studio for Big Data is a free open source development tool that packages our big
data components for Hadoop, Hbase, Hive, HCatalog, Oozie, Sqoop and Pig with our base
Talend Open Studio for Data Integration. It was released into the community under the Apache
license. It also allows you to bridge the old with the new as it includes hundreds of components
for existing systems like SAP, Oracle, DB2, Teradata and many others.
Talend Enterprise Big Data

Deloitte

Page 10

TALEND STUDIO
US India IM Technology Updates

Talend Enterprise Big Data extends the Talend Open Studio for Big Data product with
professional-grade technical support and enterprise-class features. An organization will upgrade
to this version to take advantage of advanced collaboration, monitoring and project
management features.
Talend Platform for Big Data
The Talend Platform for Big Data addresses the challenges of big data integration, data quality
and big data governance, simplifying the loading, extraction and processing of large and diverse
data sets so you can make more informed and timely decisions. Data quality components allow
you to do big data profiling, cleansing and matching using a massively parallel environment
such as Hadoop. Advanced clustering features allow you to integrate at any scale.
Delivered on top of the Talend unified platform, Talend Platform for Big Data improves
productivity across data management domains by sharing a common code repository and
tooling for scheduling, metadata management, data processing and service enablement.

Deloitte

Page 11

TALEND STUDIO
US India IM Technology Updates

3. Data Integration
3.1 Introduction
Product Talend Open Studio for Data Integration is an open source data integration product
developed by Talend and designed to combine, convert and update data in various locations
across a business.
It was launched in October 2006, under Talend Open Studio, its previous name and is
distributed under GPLv2. It had been downloaded over 1 million times in January 2008. The
product totaled 20 million downloads and over 3500 clients in January 2012.
Talend also provides Talend Enterprise Data Integration, a commercial extension to Talend
Open Studio for Data Integration with additional features, technical support and IP
indemnification.
The product provides an extensible, high efficiency, open source set of tools to access,
transform and integrate data from any business system in real time or batch to meet both
operational and analytical data integration needs. With 450+ connectors, it integrates almost
any data source. The broad range of use cases addressed include: massive scale integration
(big data/ NoSQL), ETL for business intelligence and data warehousing, data synchronization,
data migration, data sharing, and data services.

3.2 Product Description


Talend Open Studio for Data Integration operates as a code generator. It produces datatransformation scripts and underlying programs in Java. Its GUI gives access to a metadata
repository and to a graphical designer. The metadata repository contains the definitions and
configuration for each job - but not the actual data being transformed or moved. The information
in the metadata repository is used by all of the components of Talend Open Studio for Data
Integration.
The product is based on Eclipse RCP. Most of its contributors work for commercial open-source
vendor Talend.
Using graphical components, users design individual jobs, from a set of over 400, for
transformation, connectivity, or other operations. The jobs created can be executed from within
the studio or as standalone scripts.

Deloitte

Page 12

TALEND STUDIO
US India IM Technology Updates

3.3 Features
1. A Comprehensive Solution
Talend provides a Business Modeler, a visual tool for designing business logic for an
application; a Job Designer, a visual tool for functional diagramming, delineating data
development and flow sequencing using components and connectors. It also provides a
Metadata Manager, for storing and managing all project metadata, including contextual data
such as database connection details and file paths.
2. Broad Connectivity to All Systems
Talend connects natively to the following databases: packaged applications (ERP, CRM, etc.),
SaaS and Cloud applications, mainframes, files, Web services, data warehouses, data marts,
and OLAP applications. It offers built-in advanced components for ETL including string
manipulators, Slowly Changing Dimensions, automatic lookup handling and bulk loading. Direct
integration is provided with data quality, data matching, MDM and related functions. Talend
connects to popular cloud apps including Salesforce.com and SugarCRM.
3. Teamwork and Collaboration
The shared repository consolidates all project information and enterprise metadata in a
centralized repository. This repository is shared by all stakeholders: business users, job
developers, and IT operations staff. Developers can easily version jobs with the ability to rollback to a prior version.
4. Advanced Management and Monitoring
Talend includes various features such as powerful testing, debugging, management and tuning
features with real-time tracking of data execution statistics and an advanced trace mode. The
product incorporates tools for managing the simplest jobs to the most complex ones, from single
jobs to thousands of jobs. Processes can be deployed across enterprise and grid systems as
data services using the export tool.
It is a free, open source development tool; Talend Enterprise Data Integration adds teamwork
and management functions; Talend Platform for Data Management adds data quality and
clustering. Below is the comparison of all three:

Deloitte

Page 13

TALEND STUDIO
US India IM Technology Updates

3.4 Benefits
1. Optimized Time and Cost
Talend's solutions are 50% to 80% cheaper than equivalent proprietary solutions offered in the
market. They are also less expensive to deploy, maintain, and support. In addition, they
facilitate faster development and production as compared to proprietary tools and hand-coding.
2. Functionality, Performance, Reliability
Far from the stereotypes concerning the lack of professionalism surrounding open Source,
Talends solutions are real business tools, offering a level of functionality equal to that of
proprietary vendors. And being open source doesnt mean that a solution was developed by
volunteers in their spare time. Talend has its own R&D teams and, as discussed above, its
solutions are enriched by contributions from the community. Although Talend controls the
product roadmap, the company continually listens to the opinion and needs of its community
and its customers to help effect change. Talend is committed to providing powerful, reliable
solutions that attract many users.

Deloitte

Page 14

TALEND STUDIO
US India IM Technology Updates

3. Universality, Versatility for all Projects


Talends data integration solutions are not limited to standard ETL (Extract-Transform-Load)
functionality for Business Intelligence, but can also be used for operational data integration
projects; typically these are still done manually, but can benefit from a data integration
environment
4. The Broadest Connectivity
With more than 400 connectors, Talend's solutions provide virtually unlimited connectivity to
enterprise systemsdatabases, software packages, mainframes, files, Web Services, etc. No
other solution available on the market today offers so many connectors.
5. Enterprise-Grade Support
Contrary to popular belief, commercial open source vendors provide real support services,
similar in type and quality to those offered by proprietary vendors. This is the case with Talend,
whose services are designed to facilitate teamwork and increase productivity. These services
are delivered by Talend experts, or by Talend-certified partners, and offer the same services as
the largest proprietary vendorsservice level agreements (SLA), guaranteed response times,
etc.
6. The Strength of a Community
When one refers to open source, the community is also there. Open source users benefit from
the strength of this community, both in terms of support and product development.
7. Stable & Predictable Pricing Model
Proprietary vendors charge a data tax which increases the cost of processing additional
dataadding servers, data sources/targets, or even transitioning to multi-core CPUs requires
the purchase of additional licenses. Thus, infrastructure costs are not predictable and
companies cant determine when they will reach their limits. With Talend, the cost of the solution
is based on the number of developers of data integration processes. You can access new data
as needed. For instance, when setting up a new application or acquiring a new business
operations sometimes hard to predict in advanceyou dont need to buy additional licenses.
Moreover, if a company is moving from development mode to maintenance mode, it doesnt
have to keep all its licenses.
8. Fast Learning Curve
Talend tools are user-friendly and very easy to handle. The graphical user interface is intuitive
and doesnt require formal training. Talends Job Designer provides both a graphical and a
functional view of the actual integration processes using a graphical palette of components and
connectorsthe Component Library. Integration processes are built by simply dragging and
dropping components and connectors onto the workspace, drawing connections and
Deloitte

Page 15

TALEND STUDIO
US India IM Technology Updates

relationships between them, and setting their properties (most properties are inherited from the
metadata). Talend`s Business Modeler leverages a top-down approach, allowing line-ofbusiness stakeholders to get involved in the design of the integration processes and to monitor
development progress.
9. No Barrier to Adoption
Implementing Talend Open Studio is quick and easyjust download the latest version from
Talend`s website and install it.
The product is free, which means that you don`t need to justify it to management or start a
formal procurement process before solving your integration issues. You won`t spend any time
on administration tasks and wont need any vendor face time. You can use the product in an
unlimited mode and, of course, keep it as long as you like.
10. Market-Ready Products, Not Evaluation Versions
Talend Open Studio is a complete product comprising many features and a "wide range of
connectors". Talend's flagship product, it is the most open, innovative and powerful data
integration solution available today. Talend Open Studio isn`t a lightweight product or trial ware.
It contains all the features required for building powerful data integration processes, and is freely
downloadable and usable under the GPL v2 license.
Talend Integration Suite is an enhanced version of Talend Open Studio, providing additional,
enterprise-level functionality (collaboration, automated deployment, load balancing, monitoring)
for enterprise-grade projects. The solution includes high-level technical support to respond to
corporate issues and legal guarantees of intellectual property protection (IP indemnification).

Deloitte

Page 16

TALEND STUDIO
US India IM Technology Updates

4. Data Quality
4.1 Introduction
Talend Open Studio for Data Quality is an open source software that helps companies to
assess the quality of data contained in their databases and business applications, and to decide
which actions must be taken to correct erroneous or incomplete data.
Talend Open Studio for Data Quality was launched in June 2008, under its previous name:
Talend Open Profiler and is distributed under LGPL. Talend also provides Talend Enterprise
Data Quality, integrated in Talend Enterprise Data Integration, which is a commercial extension
to Talend Open Studio for Data Quality with additional features.
This data profiling tool allows business users to define a set of designators for each data
element that needs to be analyzed or monitored. It produces sophisticated reports and graphs
that let users analyze the level of quality of the data.
The following profiling needs are addressed by Talend Open Studio for Data Quality:
A. Metadata discovery, which identifies the structure of the databases that need to be
analyzed.
B. Statistics definition, which defines the statistics and metrics that need to be measured on
each data item.
C. Results and graphs, which make it easy to view the results and assess the level
of quality of the data.

4.2 Features
1. A Complete Solution
Talend provides a consummate data quality solution with built-in data connectivity, profiling,
cleansing, matching and monitoring to address all your data quality and data governance needs.
Data quality capabilities can be scaled to handle anything from flat text files to enterprise data to
Hadoop. Talend is able to leverage the best capabilities of the platform to provide data quality
seamlessly across many data types and over any data volume.
2. Data Profiling
Data profiling is all about understanding data completely, and making sure it conforms to
company and industry standards. With Talend, users can profile and analyze data, then create
and share web-based reports on the quality of the data. With this information you can build team
Deloitte

Page 17

TALEND STUDIO
US India IM Technology Updates

alignment on the use of data and highlight areas for improvement. Talend provides pre-defined
tests to ensure data quality is fit-for-use within your enterprise application, or you can define
your own.
3. Data Standardization and Enrichment
The secret behind Talends data standardization and enrichment capabilities are built-in data
integration and powerful parsing technology. Use the integrated parsing technology to assign
structure to data that has none. Then achieve data quality enhancement and enrichment by
using free reference data. Talend provides ways to integrate most external reference data
sources for postal validation, business identification and credit score information, to name just a
few.
4. Data Matching and Survivorship
Talend provides a variety of data matching solutions that moves the process of overly complex,
green-screen match rules editing to real-world business users. Users can configure matching
within the Talend user environment instead of heavy editing of rules files and using multiple
GUIs that are associated with most data quality tools. Create what-if analysis when modifying
matching techniques with charts and graphs for key matching metrics.
Talend incorporates data quality into several products including Talend Open Studio for Data
Quality and Talend Platform for Data Management.

Deloitte

Page 18

TALEND STUDIO
US India IM Technology Updates

4.3 Working Principles of Data Quality


To profile data using the studio involves the following steps:
1. Connecting to a data source including databases, a Master Data Management (MDM)
servers and delimited files or excel files in order to be able to access the tables and columns
on which you want to define and execute analyses.
2. Defining any of the available data quality analyses including database content analysis,
column analysis, table analysis, redundancy analysis, correlation analysis, etc. These
analyses will carry out data profiling processes that will define the content, structure and
quality of highly intricate data structures.
3. Generating reports from different analyses and store them in a database. These reports
allows to compare current and historical statistics to determine the improvement or
degradation of data.
4. Access different analytical tools that will allow you to explore and monitor the reports
generated in the studio.

4.4 Benefits

1. Improves performance and provides Big Data alternative for Data Quality.
2. Improves data quality, which directly impact business analysis and decision making within
an organization.
3. Get developers up to speed quickly and learn new techniques that can be applied directly
into real world projects.
4. Less testing time, more accuracy, improves the overall data quality and analysis.
5. Better data, means better execution, means more cost effectives - when it comes to
leveraging the CRM system.

4.5 Root Causes of Data Quality Problems


We all can very well identify data quality problems. These problems can undermine your
organizations ability to work efficiently and comply with government. The specific technical
problems include missing data, misfielded attributes, duplicate records and broken data models
to name just a few.
But rather than merely patching up bad data, the best strategy for fighting data quality issues is
to understand the root causes and put new processes in place to prevent them.
Deloitte

Page 19

TALEND STUDIO
US India IM Technology Updates

1. Typographical Errors and Non-Conforming Data


Despite a lot of automation in our data architecture these days, data is still typed into Web forms
and other user interfaces by people. A prevalent source of data inaccuracy is that the person
manually entering the data just makes a mistake.
2. Information Obfuscation
How often do people give incomplete or incorrect information to safeguard their privacy?
Data entry errors might not be completely by mistake. If there is nothing at stake for those who
enter data, there will be a tendency to fudge.
3. Renegade IT and Spreadmarts
A renegade is a person who deserts and betrays an organizational set of principles. Thats
precisely what some impatient business owners unknowingly do by moving data in and out of
business solutions, databases and the like. Rather than wait for some professional help from IT,
eager business units may decide to create their own set of local applications without the
cognizance of IT. While the application may meet the immediate departmental need, it is
unlikely to adhere to standards of data, data model or interfaces. The database might start by
making a copy of a sanctioned database to a local application on team desktops. So-called
spreadmarts, which are consequential pieces of data stored in Excel spreadsheets, are easily
replicated to team desktops. In this scenario, you lose control of versions as well as standards.
4. After the Merger
Corporate mergers increase the likelihood for data quality errors because they usually occur fast
and are unforeseen by IT departments. Almost immediately, there is pressure to consolidate
and take shortcuts on proper planning. The consolidation will likely include the need to share
data among a varied set of disjointed applications. Many shortcuts are taken to make it
happen often involving known or unknown risks to the data quality.
5. Hidden Code
Databases rarely commence their life blank. The commencement point is typically a data
conversion from some previously existing data source. The problem is that while the data may
work impeccably well in the source application, it may fail in the target. Its difficult to see all the
custom code and special processes that happen beneath the data unless you profile.
6. Transaction Transition
Huge chunk of data is exchanged between systems through real-time interfaces. As soon as the
data enters one database, it triggers procedures necessary to send transactions to other
downstream databases. The advantage is immediate propagation of data to all relevant
databases.
However, what happens when transactions go wrong? An inadequate system could cause
problems with downstream business applications. In fact, even a small data model change
could cause issues.

Deloitte

Page 20

TALEND STUDIO
US India IM Technology Updates

7. Defining Data Quality


More and more organizations recognize the need for data quality, but there are different ways to
clean data and improve data quality. You can:
Write some code and cleanse manually
Handle data quality within the source application
Buy tools to cleanse data
However, consider what happens when you have two or more of these types of data quality
processes adjusting the data. This might bring anomalies in data.

Deloitte

Page 21

TALEND STUDIO
US India IM Technology Updates

5. Master Data Management (MDM)


5.1 Introduction
Mastering data has been a goal for as long as there have been disparate, heterogeneous, data
sources. Until today, access to the necessary tools to realize this goal has been cost-prohibitive.
Additionally, homegrown development of a proper master data solution often proves too
complex and difficult to evolve and maintain. Master Data Management (MDM) has proven to be
extremely valuable, but only as an esoteric solution restricted to elite or large organizations with
huge resources. Open source MDM reduces implementation complexity, time-to-value, and
cost. In fact, open source in any market helps organizations overcome these obstacles and
realize their goals.
Talend MDM provides the technology to create a unified view of information and manage that
master view over time. Talend simplifies MDM with a flexible and open approach to master data
projects, and combines complete functionality for data integration, data quality, data profiling,
data mastering, and data governance. Talend MDM provides collaborative workflow enabling
teams to build and enforce data governance policies. It provides a system of record and
ensures that master data stays clean and is made available to those who need it.
Talends MDM solutions extend the core Talend competencies of integration and quality with
functions that rationalize, master, and perform data stewardship. In fact, the Talend`s approach
to MDM incorporates these core competencies into the functional definition of Master Data
Management. This is a natural extension of an already successful product offering.
High quality master data is extremely important for enterprise business processes and analytics.
However, data resides in disparate systems across an organization, is rarely in a standard
formats and is often found in varying quality levels. Open source is a positive force that can
galvanize the MDM community to strengthen the definition of its function and purpose while
elongating the reach and availability of a solution. Users now have the freedom to master their
data.

5.2 Talend MDM Functional Architecture


Talend MDM architecture can be broken down into functional blocks that enable interaction
between users and the MDM Hub and their corresponding IT needs.

Deloitte

Page 22

TALEND STUDIO
US India IM Technology Updates

The main functional blocks are:


1. The Integration block, where data integration can be carried out regardless of processes

complexity and data volumes.


2. The Profiling block, used for Data Quality, where data sources are profiled and cleansed

before being loaded into the MDM Hub.


3. MDM model where the master data entities of the organization are defined and managed.
4. MDM hub where the data associated with the data models and user roles are stored in XML
format.
Talend Studio tightly couples the above four blocks to provide processes for collecting,
aggregating, matching, consolidating, quality-assuring, persisting and distributing data
throughout the organization.

5.3 Features
1. Master Any Domain
The scalability and flexibility to model and master any domain is provided by Talend. You can
start by mastering a single domain and incrementally increase this to include other domains all
within a single deployment. You can define advanced business rules, validations, access rights
and registry lookups directly on the model. The model sits at the center of the MDM solution and
drives all communications with closed loop quality, workflow and end to end integration
functions.

Deloitte

Page 23

TALEND STUDIO
US India IM Technology Updates

2. Powerful Business User Interface


Talend`s GUI provides the business users a rich utility to search and manage master data.
Standard composite views are provided to gain a 360-degree view of any mastered entity or to
investigate a hierarchy. Businessoriented views can be specified with the Smart View custom
form designer.
3. Workflow and Business Processes
Talend sanctions you to define and track master data through a series of steps using tasks. This
workflow utilizes an intuitive graphical interface and displays a graphical trail of process steps
for contextual history. Every create, read, update or delete of master data is evaluated and
initiates an event to look for a duplicate, enrich data, synchronize a back-end system, send an
email and even kick off a custom workflow.
4. Data Quality Built-in
Comprehensive data quality features are provided by Talend`s MDM. These cover data
profiling, data standardization, parsing and next generation data matching that provides a
superior alternative to overly complex processes used by legacy vendors. Sophisticated
matching algorithms are provided to help find duplicates and probable duplicates. Stewardship
and survivorship components define common business rules to apply to sets of duplicate
records and automate the creation of a single master.
5. Manage Permissions, Users and Groups
Role-based access controls are applied to every concept in Talend MDM - from read/write
access to master data attributes to workflow and user permissions.
Talend provides two MDM editions to address organizational needs: Talend Open Studio for
MDM is a free, open source development tool and Talend Platform for MDM adds advanced
deployment and management functions.

Deloitte

Page 24

TALEND STUDIO
US India IM Technology Updates

5.4 Advantages
1. The new UI allows users to easily find and navigate through master data and increases
adoption of master data applications.
2. Customization of the web UI improves adoption in the business audience for MDM. Custom
forms allow for a consistent look and feel for an organization.
3. Find master data in seconds. No other MDM solution provides this function.
4. Provide a business user with a single view of a master entity to gather complete insight.
5. Enable the business to create and manage large sets of master data.
6. Enable a team of developers across all MDM functional categories to share and collaborate
on an MDM project.
7. Manage simple and complex hierarchies in an easy to use and intuitive interface. The
Talend platform features a user-friendly interface specifically designed to promote sharing
and collaboration.
8. Talends open source business model gives OEM partners the best of both worlds.
Leverage the mindshare of more than 750,000 developers and all that they represent: over
500 connectors, including sophisticated technologies such as Hadoop, SAP, Salesforce.com
and more; rigorous quality assurance; useful, user-requested features, and user forums
teeming with expertise.
Deloitte

Page 25

TALEND STUDIO
US India IM Technology Updates

6. Conclusion
Talend has progressively built best of breed solutions for all integration needs while working on
a common unified platform. Products by Talend are a powerful and versatile open source
solutions for Big Data integration that addresses the needs of the data analyst by providing
them with a graphical tool that abstracts the underling complexities of big data technologies and
dramatically improve the efficiency of Job Design.

With data spread out across locations, from Databases to file store to multiple versions
of Enterprise Applications, it helps Integrating Disparate Data Sources.

Since it is an Open Source Solution, it helps processing data at High Speed which
reduces the cost significantly. Talend's solutions are 50% to 80% cheaper than
equivalent proprietary solutions offered on the market.

Improves data quality, which directly impacts business analysis and decision making
within an organization.

While we step in the Big Data Era, an open source solution with numerous advantages is bound
to succeed. It is very likely that the big data landscape will see more innovations before the
unanimity emerges on the right technology architecture.

Deloitte

Page 26

S-ar putea să vă placă și