Sunteți pe pagina 1din 7

https://www.quora.

com/profile/Xitiz-Pugalia

Are data warehousing and business intelligence interconnected?


What is the scope of these two fields in the future?
Data Warehousing (defined below) is basically a pre-requisite of Business
Intelligence.
Business Intelligence (BI) is all about using the information you have &
applying it for improving your business or attain knowledge about various
processes in your business.

o BI consists of 2 main steps:-


Step 1. Data Warehousing: It is about using raw data from 1 or more data
sources & then storing it in a very simplified & organised manner so that it
could later be used to derive insights from.
To create a data warehouse, we need to have a data source & target (data
storage system: for example an oracle / sql server / MySQL DBMS) & we need
to perform an ETL operation to Extract data from the source, Transform the
data as desired & then Load the data to the warehouse. Famous ETL tools
include: Informatica Powercenter, IBM Datastage, Microsoft SSIS, AbInitio.
Step 2. Reporting: This step involves using the data warehouse & then
making it's data presentable. Reports could be in the form of a graph, a bar /
pie chart, a dashboard or any other form that the business heads of an
organisation can use & apprehend. Famous Reporting Tools include: Tableau,
IBM Cognos, Microstrategy.
The overall aim is to help the Business make better decisions & see things that
they cannot easily detect / analyse otherwise.
For Scope of these 2, see: Does data warehousing and BI have a bright future?
Analytics is the process of analysing the data (organized through data
warehousing) for taking better decisions for the benefit of an organisation. You
can always switch jobs in either of these fields interchangeably including Big
Data Analytics.
How can I learn informatica in a month?

You can do it comfortably (even less than a month for basics) by implementing the
following steps:-
1. Install the software in a system accessible to you.
2. Find someone who genuinely knows / or has worked on the tool for training.
3. learn how to extract data from flatfile & a few DB sources
4. learn how to load data to a flatfile & a few DB targets
5. learn how to use basic transformations such as Expression, filter, router, update
strategy, aggregator, sequence generator, sorter.
6. practice some scenarios using these & once comfortable, move on to learn a bit
more complex transformations such as lookup, rank, normalizer, stored procedure ,
xml, etc
7. if you still have time, learn workflow tasks & SCD / CDC / other concepts.

Alternatively, you can go for Informatica's official training (link: What are some
good online sources to learn Informatica PowerCenter?).

Does Informatica support unstructured/semi structured data


coming from social networks?
Yes, it does that very well.

Informatica Powercenter has Social Media Connectors(View a demonstration video


for facebook on youtube:

https://youtu.be/6P0OfwU2QbM

for Facebook, LinkedIn & Twitter & Informatica Vibe / Developer has connectors
for social media as well as tools like Kapow Katalyst which can help collect /
automate collection of data from almost any website directly.

For completely unstructured data (eg- PDF, Word, industry specific files like
HIPAA, etc), you could use Informatica B2B & for semi-structured data, you can use
Powercenter, or Vibe or cloud or B2B.

It also has an Informatica Connector Toolkit using which you can create your
own connectors using an eclipse based coding platform. Once you create them, you
can integrate it with Informatica Developer / Vibe.

Is anyone searching for best Informatica online training?


Available "Official" Informatica Training Courses:-
(click titles in the link for description and registration)

Free Training and Tutorials

Velocity: Informatica Best Practices & Methodology


Cloudera Essentials for Apache Hadoop
Data Warehouse Academy
PowerCenter Express
Enterprise Data Integration & Data Quality

365onDemand - Data Integration Subscription


365onDemand - Data Quality Subscription
PowerCenter 9.x: Developer, Level 1
PowerCenter 9.x: Developer, Level 2
PowerCenter 9.x Administration
PowerCenter 9.x: Operations and Support
ETL on Hadoop for Data Warehousing
Data Validation Option for PowerCenter (DVO)
Informatica Developer 9.x: Data Services Introduction
Informatica Developer Tool: Introduction
Data Quality 9.x: Developer
Data Quality 9.x: Developer, Level 2
Data Quality 9.x: Analyst
Address Verification using Informatica Data Quality
Data Quality Setup and Execution in Informatica PIM
Integration between Informatica PIM and Informatica BPM
Informatica Business Glossary 9.6
Informatica Data Services: Security Administration

Master Data Management (MDM)

365onDemand - MDM Subscription


MDM: Administration
MDM: Service Integration Framework (SIF)
MDM 9.x: Configuring Informatica Data Director
MDM Multidomain Edition 9.x: Configuration
MDM 9.x: Configuring Hieratchy Manager
MDM: Administration, Multidomain

Information Lifecycle Management (ILM)

ILM Data Archive 6.x: Live Archiving Base


ILM Data Archive 6.x: Application Retirement
ILM Data Archive 6.x: Data Visualization
ILM Data Archive 6.x: Live Archive to Custom Database
ILM Data Archive 6.x: Live Archive to Packaged App
ILM Data Archive 6.x: Live Archiving to File Archive Server
ILM Data Archive 6.x: Mainframe Retirement
ILM Data Archive 6.x: Smart Partitioning
ILM Dynamic Data Masking 9.x: Developer, Administrator
ILM Test Data Management 9.x: Data Subset and Data Masking
Informatica Test Data Generation

Why should I use an existing ETL vs writing my own in


Python for my data warehouse needs?
In my opinion, this sounds like:- Why should I use Java while I can still do all the stuff in
C / C++ OR why should I use Windows when I can still do the same stuff in Linux /
Unix. Ofcourse when the requirement is too complex to use the existing ETL Tools, you'll
have to go for coding. I haven't come across a single business scenario till date where-in
ETL tool doesn't fit-in & hand-written code does. Script Snap in Snaplogic or Custom &
Java Transformations in Informatica are built just to do that.

There are numerous parameters to analyse when deciding to go for an ETL Tool or
Coding:-

Visual flow
The single greatest advantage of an ETL tool is that it provides a visual flow of the
systems logic (if the tool is flow based). Each ETL tool presents these flows differently,
but even the least-appealing of these ETL tools compare favorably to custom systems
consisting of plain SQL, stored procedures and system scripts, and perhaps a handful of
other technologies.

Structured system design


ETL tools are designed for the specific problem of data integration: populating a data
warehouse or integrating data from multiple sources, or even just moving the data. With
maintainability and extensibility in mind, they provide in many cases a metadata-driven
structure to the developers. This is particularly a big advantage for teams building their
first data warehouse.

Operational resilience
Many of the home-grown data warehouses we have evaluated are rather fragile: they
have many emergent operational problems. ETL tools provide functionality and
standards for operating and monitoring the system in production. It is certainly possible
to design and build a well instrumented hand-coded ETL application. Nonetheless, its
easier for a data warehouse / business intelligence team to build on the features of an
ETL tool to build a resilient ETL system.

Data-lineage and impact analysis


We would like to be able to right-click on a number in a report and see exactly how it was
calculated, where the data was stored in the data warehouse, how it was transformed,
when the data was most recently refreshed, and from what source system(s) the numbers
were extracted. Impact analysis is the flip side of lineage: wed like to look at a table or
column in the source system and know which ETL procedures, tables, cubes, and user
reports might be affected if a structural change is needed. In the absence of ETL
standards that hand-coded systems could conform to, we must rely on ETL vendors to
supply this functionality though, unfortunately, just half of them have done so far
(more results in our survey).

Advanced data profiling and cleansing


Most data warehouses are structurally complex, with many data sources and targets. At
the same time, requirements for transformation are often fairly simple, consisting
primarily of lookups and substitutions. If you have a complex transformation
requirement, for example if you need to de-duplicate your customer list, you should buy
on additional module on top of the ETL solution (data profiling / data cleansing). At the
very least, ETL tools provide a richer set of cleansing functions than are available in SQL.
Download the ETL Tools& Data Integration Survey to see how the ETL tools compare on
this aspects.

Performance
You might be surprised that performance is listed as one of the last under the advantages
of the ETL tools. Its possible to build a high-performance data warehouse whether you
use an ETL tool or not. Its also possible to build an absolute dog of an data warehouse
whether you use an ETL tool or not. Weve never been able to test whether an excellent
hand-coded data warehouse outperforms an excellent tool-based data warehouse; we
believe the answer is that its situational. But the structure imposed by an ETL platform
makes it easier for an (novice) ETL developer to build a high-quality system.
Furthermore many ETL tools provide performance enhancing technologies, such as
Massively Parallel Processing, Symmetric Multi-Processing and Cluster Awareness.

Big Data
A lot of ETL tools are capable of combining structured data with unstructured data in
one mapping. In addition they can handle very large amounts of data, that do not
necessarily have to be stored in data warehouses. Now Hadoop-connectors or similar
interfaces to big data sources are provided by most of the ETL tools nowadays. And the
support for Big Data is growing continually.

Other advantages of ETL Tools in DataWarehousing scenarios include:

A set of comprehensive Scheduling Mechanisms.


Logging, Audit and Metadata support
Easier Maintenance specially when multiple developers are to be involved
Supporting heterogeneous connectivity (Tools like Informatica have connectors
for everything. see: What is Informatica?)
Robust execution control and error handling
Parallel processing
High Availability
Partitioning / Push-down Optimization capabilities

Ofcourse there are scenarios where-in Hand written code would be better (not
faster than existing ETL's in development though) but the challenge is to select
the right ETL Tool depending on your scenario instead of thinking of writing it
on your own.

What is the role of Informatica's ETL in DB2 to Teradata's


migration project?
Informatica Powercenter has direct connectors for DB2 as well as Teradata &
therefore using Informatica Powercenter, you can create a data flow (Mapping) to
extract data directly from DB2 & load it to Teradata & also process / apply basic
data cleansing operations within the mapping.
Once Mappings are created, you can execute it anytime & as many times as you
want to migrate the data. You can also automate this process so that whenever new
data comes in DB2, it gets moved to Teradata automatically.

Which company do you think is the best place to work with


ETL Informatica tool and why?
It depends on what you wish to achieve in your career.

If you want a fast growth as an Informatica Developer, it might be a different firm


than if you are looking for a hardcore, challenging Informatica experience.
Similarly, if you want to stay as an Informatica Developer for a really long time, it'd
be a different firm.
So let's drill-down in each of the cases mentioned above:-
1. For fast growth while being an Informatica Developer: Must opt for smaller firms
or larger firms with very few or no informatica resources. If you have a great
knowledge of the tool, this would be the perfect platform for you to grow your
position in the company by showcasing your talent. Some of the companies you
could opt for are: Informatica Supplier, Partner, Competitor and Customer
Lists(These are Informatica customers as of March 2015- so they have the license of
the product. but may or may not have the resources who can implement solutions
using the same)

2. For exploring the tool to the utmost level or to solve the most difficult challenges
using Informatica: Opt for any companies with a variety of data sources or
applications to be integrated. If you get a chance, aim for R&D or Centre of
Excellence departments of a company that is a partner of Informatica corp. You can
find a category-wise list of Informatica Partners on the link: Partners - Technology
Partners (Most of them have atleast 6 months of access to latest Informatica
Product Access & training / material access)

3. To keep doing a stable / balanced Informatica work for a long time: Join any large
cap. company / MNC with a number of decent Informatica Projects. Examples
include: Cognizant, Accenture, TCS, Infosys, Wipro, HCL, etc.

If you ask me, I'd always prefer the R&D Department of a firm that has the top-
most level partnership with Informatica as it has its own benefits like: Access to
trainings / latest products & it can help you win clients for yor organisation or even
build some Intellectual Utilities for your organisation plus an opportunity to sell it
on Informatica Marketplace.

What is the next big thing after big data?


The Next Big Thing after Big Data is IOT & BDaaS.

Din't get it ?

IOT:-

The Internet of Things (IoT) is the network of physical objects or "things" embedded
with electronics, software, sensors, and network connectivity, which enables these
objects to collect and exchange data.

BDaaS (Big Data as a Service):-

BDaaS consists of a wide variety of outsourcing of various Big Data functions to the
cloud.

Others might include hybrid approaches mixing up Machine Learning & Artificial
Intelligence with the Big Data / Cloud / Realtime activities.

What are the new features of informatica power center 10?


Through the eyes and daily routine of typical developers and business analysts you will discover
how this new release:
. Enhances the collaboration between IT developers and business analysts
. Delivers more powerful visualization for data profiling
. Delivers a new monitoring dashboard to view service health and system usage
. Increases your productivity with up to 50X faster data lineage rendering
. Enhances your project reach with new connectors and real time capabilities
. Includes new capabilities for parsing semi-structured and unstructured data

Which is better to learn and have good opportunities: Oracle


DBA, SQL DBA or Informatica?

I'm an Oracle certified Associate DBA, Oracle Certified Professional DBA as


well as an Informatica Certified Specialist. I guess that's something that might lead
you to mark my words if you trust certifications.

I'd take Informatica - 9 out of 10 times irrespective of whether you have an interest in
Development or Performance Tuning or Administration activities as Informatica has its
own Administration Tool & features.

As Data Grew, people who used to store data in registers / sheets, started off with Data
Storage in Files, then came Databases, people started storing data in DBs such as Oracle
which followed Codd Rules to the maximum extent.

Then Data Grew Further & Databases had their own limitations & capabilities that
bought out the introduction of Data Warehouses & No SQL / Big Data Specific
Databases.

Informatica processes data from All of the above (& Cloud & Websites & Internet of
Things for OLTP as well as OLAP systems) while Oracle lies in the era where SQL DB's
came into existence & was considered outdated (also termed as Legacy by a few Data
lovers) to a large extent when NoSQL or Big Data specific systems came into picture.

If you are looking at Oracle or SQL DBA at this point of time, it means you are going
back (to a large extent) instead of going ahead with the world of IOT / Big Data / Cloud /
NoSQL DBs, etc.

In terms of career also, there'd always be a limitation in a DBA role at one point of time
in a person's career while Informatica would always open up new windows for you.

S-ar putea să vă placă și