1. Information integration to acceleratedatawarehouse deployments
IBM® InfoSphere®Information Server for Data Warehousing provides information integration capabilities for yourdata warehouse.It helps you
understand profile,cleanseand integratedata from heterogeneous sources to gainfaster business insight, at lower cost. InfoSphere
Information Server for Data Warehousing can providethefollowing features and benefits:
Acceleratesdeployment: Provides user friendly tools,integration architecture, fullsourcedata analysis andpre-builtconnectivity.
Increasesbusinessconfidence inwarehouse data: Offers data cleansing and quality monitoring capabilities.Helps youunderstand both the
business contextassociatedwith data and the data lineageto improvegovernance.
Minimizestotal costofownership: Helps youbuildtherightdata warehouse byproviding best practices, data profiling and user centrictooling
that promotes self-service.
The importance ofgathering andgaining insightinto data has neverbeen greater,even as data sources increase in number. Organizations need
robust informationintegration capabilities thatsupport their business requirements andcreatetheutmost confidence intheirdata, and thus in
the results of theirdecisionmaking. By using theend-to-end informationintegration capabilities ofIBM® InfoSphere®Information Server,
companies areableto betterunderstand,cleanse, monitor,transform and deliver not only their data, but alsodata derived fromexternal
sources. This white paperdiscusses thedata integration, data quality andcatalog packages andV11.3 enhancements.
Read the white paper and learn how to:
Transform data inany formatanddeliverit toany system, supporting faster timeto valueandreducedrisk for IT
Understanddata and foster collaboration betweenITandline-of-business teams to narrow the communication gap and create business-driven
information integration
Establishand managehigh-quality data, transforming a deluge ofdata into trustedinformation
Robust integration capabilitiesfor greater confidence inthe big dataera
The importance ofgathering andgaining insightinto data has neverbeen greater,even as data sources increase in number.
Organizations of allsizes mustaccommodatestreams ofstructured and unstructured data; data from internaland third-party
clouds; and even data as granularas departmental databases and user spreadsheets.
This puts a newspin onanoldproblem: thequestion ofintegration.Organizations need robustinformationintegration capabilities
that supporttheirbusiness requirements and create the utmostconfidencein their data,andthus in theresults oftheir decision
making. They need tosuccessfully and flexibly integratedata from anydata source,all while applying governance and data-quality
best practices. Only with that assurance oftrustcan firms be surethatcriticalprojects and key analytics initiatives will succeed.
By using the end-to-end information integrationcapabilities ofIBM®InfoSphere® Information Server, companies areable tobetter
understand, cleanse, monitor,transformand deliver notonly their data, but also data derived fromexternalsources (seeFigure 1).
InfoSphere Information Server also helps organizations collaborate toestablish or improvedata governance— whether forming a
business glossary orcreating data governancerules andpolicies to bridgethegapbetween line-of-business andITteams. These
capabilities helpfirms ensurethattheinformation thatdrives their business andstrategic initiatives—frombig data andpoint-of-
impact analytics to masterdata management(MDM) and data warehousing (on-premises or in thecloud)—is trusted, consistent and
conforms to governance policies.
Since its inception,InfoSphereInformationServer has provideda massively parallel processing (MPP) platform, supporting data
volumes ofany size, regardless ofcomplexity. It candelivertheflexibility (throughextract, transform andload [ETL]or extract, load
and transform[ELT]) performance, andscalabilityrequiredfor big data projects. IBMInfoSphere Information Server V11.3 delivers
the “anywhere” information integrationcapabilities that organizations need toaddress theincreasing volume andcomplexity of
data anddata sources. Acknowledging theongoing challenges organizations facewith agileintegration, business-driven governance
and sustainabledata quality,IBMdeveloped InfoSphere Information Server V11.3to givethemthesophisticated information
integration capabilities necessaryto thriveintoday’s information-rich environment.
InfoSphere InformationServer for DataIntegration: Transformdata inanyformat anddeliverit toanysystem, supporting faster
time to valueand reduced risk for IT.
InfoSphere Information Server for DataQuality: Establish andmanage high-quality data, transforming a delugeof data into trusted
information.
InfoSphere InformationGovernanceCatalog: Understanddata and foster collaboration between ITand line-of-business teams to
narrow thecommunicationgap andcreatebusiness-driven information integration.
2. InfoSphere InformationServer EnterpriseEdition: Gainthecapabilities ofallthreeindividual packages in onecomprehensive
package so firms can startinformationintegration efforts in one area, and thenexpand efforts for further optimization.
This whitepaper discusses the data integration, data qualityandcatalog packages, as well as theways InfoSphere Information
Server V11.3helps organizations address data integration needs in cloud environments (both private and public), gain better insight
into big data,improve self-servicedata integrationand accommodatetighter MDMintegration.InfoSphereInformation Server V11.3
also features performance andsecurity enhancements.
The enhanced InfoSphere Information Server packages
Using InfoSphereInformationServer packages,organizations can provideaccurate, comprehensive information innear-realtime to
the systems andknowledgeworkers focusedon strategic initiatives.
InfoSphere InformationServer for DataIntegration: Agileintegration capabilities
InfoSphere Information Server for Data Integration delivers agile integration capabilities sothat businesses canintegratedata quickly and
flexibly wherever itresides. Businesses can easily manage data integration for data warehouses,integratebig data,consolidate applications,
deploy informationin a privateor pub-liccloudor integrateon-premises data with cloud environments.
Dataintegration for cloud environments In addition to its current support for privateclouddeployments (either inconjunctionwith IBM
PureApplication® Systemas a managedplatform as a service,or inconjunctionwith IWD or SCO, for clients or partners thatprefer using a self-
serviceprivatecloud),InfoSphereInformation Server V11.3now supports public cloud environments andprovides enhancements for
integrating on-premises data withcloudenvironments. Italsosupports directintegration so users can loaddata into AmazonS3. Afterdata is
integratedwithin S3,it may bepicked upby other cloud database technologies. Building onthatintegration foundation, Version11.3 includes
integration with RESTapplication programming interfaces (APIs), enabling support for XMLand JSON messages.By delivering REST-based
connectivity, InfoSphere Information Server is ableto support distributed database-as-a-service(DBaaS) offerings,such as IBMCloudant, as
well as other on-premises and off-premises solutions that offerREST-based interaction.
Deeper integration for MDMprojects InfoSphereInformation Server V11.3delivers tighter integration withIBMInfoSphereMaster Data
Management(InfoSphere MDM) software. Anew MDMIntegration Stageenables users toeasily loaddata into and extractdata outof
InfoSphere MDM. Users can now includeMDMdata within their data integrationflows andload domain data (relating tocustomers, partners,
suppliers, members, products, andother entities) directlyto theMDMsystem. Organizations can alsoleverageInfoSphereInformationServer
data qualitycapabilities tostandardizedata beforeloading it into InfoSphere MDM, which helps to increase matching accuracy andbetter
support 360-degreeviews ofentities. Inaddition, theMDMIntegrationStage is supported by a metadata-driven design, which means metadata
captured by the stagewill align with theMDMdata model, enabling multiplesegments ofdata to besentin a single request.
New InfoSphere DataClick V11.3 capabilitiesinclude:
Cloud integration: Integratedata directly to and fromAmazon S3.
Enhanced big datasupport: Usenativehigh-speed load to movedata into IBMInfoSphereBigInsights and oncedata is inInfoSphere
BigInsights,performadvanced analytics; access read data out ofHadoop distributions throughBig SQL or through other JDBC-based
methods, suchas Hive andHBase.
Catalog integration: Search for information through the metadata available inInfoSphereInformation Server Governance Catalog
and launch directlyintoInfoSphereData Clickto act onintegrating thatdata, for fastertime tovalue.
Enhanced usability: InfoSphereData Clicknow includes a newscreenfor creating and monitoring activities; a new web-based,
streamlined method to createand runactivities; and new links to theoperations consoles to drill down intometrics for business and
governance objectives. Second,it helps users quicklyandeasily design,manage and monitor data qualityin alignment with defined
business policies. To accelerateenterprises’ability to deliver high-quality, trusted data, InfoSphereInformationServer for Data
Quality incorporates the following new features:
Improved governance: This package now incorporates theInfoSphereInformation GovernanceDashboard and its companionSQL
views (a fully described querylayer for key metadata). Expanded reports includeanalysis ofcritical business elements that lack a
data steward. Organizations canalsoaccess IBMCognos®Business Intelligencecomponents to support deployment ofanintegrated
data governanceprogramfor both ITand line-of-business users.
Matching stewardship: Organizations cannow find “clericalpairs”(records that are closematches butrequire review) in theData
Quality Console. Records are displayed in an intuitiveformatto facilitateinvestigationofthesource data elements and related
match scores.
Enhanced productivity: Version11.3 eliminates theneedfor a discrete metadata import activity by using IBMInfoSphere
Information Analyzer toleverage the data connections andimported metadata from InfoSphereMetadata Asset Manager. It also
enhances theData Rulestage(for usewithin thejob designer) withdrag-and-drop functionality.
3. Optimized performance: Theseimprovements include new,faster algorithms for multi-columnprimarykey detection, theability to
take full advantage ofparallelism incolumn analysis, bulk-load writeto the analysis databaseand optimizedclient/server
communication toreducenetwork traffic.
Increased operational quality: InfoSphereInformationServernow provides pre-builtdata rules for operationalmetadata,so that
operations teams can monitor howthedata integration platform accommodates theservice-level agreements and best practices of
the organization.
Collectionsthat support teaming: Catalog users cancreategroups ofmetadata objects, knownas collections,in orderto
better collaborate aroundsets ofcontentwithotherusers. These collections can beassociatedwith business terms, stewards and
labels to providecontext for thegroupofassets.The creator ofa collectioncan designatewhich users may view andedit the
collection. Users who want to curateor refine the data sources inthecollection can alsolaunch directly intoInfoSphereData Click,
which cancopytheinformationfrom thesedata sources intothelarger data lake.
New datalineage infrastructure: Users getimprovedscalabilityandperformancewhenusing the new graphical Lineage
Viewer andlineagecomposerservice. Additionally, the UI now incorporates HTML 5 to enable access tolineageinformation from
mobile devices.
Optimized operationalmetadata: Anew data flow modelcan efficiently process operationalmetadata fromtens of
thousands ofdata integration activities per day.
Key core enhancements in InfoSphere Information Server V11.3 Better big data, faster
InfoSphere Information Server V11.3now includes InfoSphereBigInsights V3.0 to providea fast, cost-effectiveand scalable
approach to big data integrationwith Apache Hadoop. Using thesamesimpledesign interface available in otherdata integration
patterns,developers cannow shift ETL anddata integrationworkloads intoa Hadoop infrastructurethatis capableof
simultaneouslyhandling analytics andstaging,preparationand transformation ofdata. InfoSphereInformationServermanages the
ingestion,mapping,metadata and quality processes,freeing data scientists andbusiness users to beginexploring and analyzing data
through the intuitiveBigSheets interface.AcomponentofInfoSphere BigInsights,BigSheets allows users to visualize both traditional
and new data types, such as JSON, that InfoSphere Information Server seeded into Hadoop.
New lighter-weight services tier
InfoSphere Information Server V11.3users havetheoption to install either WebSphereApplication Server Network Deployment (for
highly available services tier configurations) or WebSphereApplication Server LibertyProfile, a dynamicapplication serverruntime
environmentfor theservices tierthat helps decrease timeduring install andfeature upgrades and lower overallsystemresource
usage costs.