The Big Picture on Big Data and Cognos
www.senturus.com/blog/big-picture-big-data-cognos/
August 1, 2016 Business Strategy & Perspectives
IBM has a long history of supporting major open source projects and the most widely adopted open standards. Their
enterprise customers have benefited from the flexibility, choice, and innovation that come with the open source
philosophy. Major projects include SOA (Service-Oriented Architecture), Linux, Eclipse, and now Hadoop. The big
data analytics open source offering is known as the IBM Open Platform with Apache Hadoop. The commercial side
of this platform, announced in early 2015, is a suite of products for the enterprise branded as BigInsights.
To better understand IBM's big data offerings around Hadoop and its open data platform, it is helpful to put this in
context of the overall vision for the platform and the three phases of the IBM Big Data Analytics lifecycle:
1. Pull in all types of data from disparate sources
2. Put the data into a business context
3. Produce intelligent, data driven business outcomes, for example, operational efficiency, customer
engagement, or risk management
IBM endeavors to cover a lot of business territory with its analytics platform. For the enterprise IT department, the
technology enables data integration, governance, security, and regulatory compliance. For line of business
managers, the analytics environment is the home of customer and operational intelligence. While analytics play an
important role in increasing operational efficiency and eliminating business process bottlenecks, it is the customer-
centric analytics that have captured the imagination of business executives. Big data analytics offers many
opportunities for improving customer relationships and increasing engagement across marketing channels.
A common big data use case is delivering relevant promotions to customers. We all share the experience of
receiving credit card offers in the mail from the bank and tossing the envelope directly into the recycling bin without
even thinking about it. Despite the dismal response rate, it was cost effective for the bank to send the same direct
mail piece to everyone. With a big data platform, it is possible to develop customer profiles and create targeted
offers for each segment. For example, customers that have a single account and a short customer history would be
candidates for a different array of promotions than someone who has been a customer for decades. The cost of
amassing enough data and having the processing power to crunch the numbers in a timely fashion has dropped
1/3
enough to make it profitable to do so.
With digital advertising and social media data, analysis is required on huge amounts of unstructured data. A couple
of years ago this was experimental at best, but now Hadoop software enables capturing and processing
unprecedented amounts of data. It complements the enterprise data warehouse and is an integral part of the
business intelligence ecosystem.
Open Data Platform ODPi
The ODPi open data platform is a consortium of IBM and 18 other enterprise software vendors working together to
maximize the adoption of technologies based on Apache Hadoop. The goal of ODPi is to accelerate software
development by providing a standard Hadoop solution on which an applications can be run, whether it is
commercial software, open source, or custom code developed in-house. This gives enterprise customers assurance
that they are not locking themselves into a single vendor's Hadoop solution. It also permits using a Hadoop
implementation with products from multiple vendors. For Hadoop to fulfill its role as an enterprise data source, it
must accommodate a broad audience who will be using many different applications.
To that end, the ODPi provides a core platform of agreed on and tested big data Apache Hadoop modules. This is
the ODPi standard, on which the vendors build their applications. For example, Hortonworks, IBM Open Platform
4.0 with Apache Hadoop, EMC Pivotal HD 3.0, and Infosys IIP all adhere to the ODPi standard. Analytics software
vendors or in-house development shops can concentrate on developing applications further up the stack, knowing
that the Hadoop core adheres to a standard and its application will interoperate with any compliant Hadoop system.
This accelerates development, promotes code re-use, and simplifies the technical architecture. Implementing a
Hadoop distribution that adheres to the ODPi standard means not being locked into a proprietary technology.
As a standard, only time will tell if the ODPi will have a lasting impact. The organization has been criticized as being
nothing more than a joint marketing effort for vendors pushing their own commercial flavor of Hadoop. Also to note
are the big data vendors who are conspicuous by their absence: Cloudera, MapR, and Amazon (AWS – EMR
Elastic MapReduce).
IBM BigInsights and Cognos
On top of Hadoop, IBM has developed a suite of big data and analytics tools under the BigInsights brand. There are
tools for scaling and managing the platform (BigInsights Enterprise Management), a machine learning engine
(BigInsights Data Scientist – Decision Trees, PageRank, Clustering) and a data exploration and discovery tool
(BigSheets). Of particular interest to Cognos customers is BigSQL which runs SQL queries against Hadoop or in
other words, BigSQL permits Cognos to use Hadoop as a data source.
This is interesting as data stored in Hadoop only becomes useful when it is put into a business context. Cognos
Analytics (V11) is well suited for this role. It is a powerful tool for BI developers and business power users, enabling
the presentation of Hadoop data in a visually appealing format for executives, managers, and line of business
staffers. Big data becomes much more valuable when it can be interpreted and understood by non-technical users.
Cognos supports connecting to Hadoop using Hive, which translates code from SQL to MapReduce to get results
from Hadoop. There will always be some latency as Hive cannot change the nature of MapReduce, which
distributes processing work across Hadoop nodes. The query is split into discrete chunks of work and the results are
assembled as they are returned. SQL join conditions, which are commonplace in Cognos generated SQL, create an
additional layer of complexity for MapReduce. This further increases the query processing time and will prevent
some queries from running at all.
IBM addresses these problems with BigSQL. It works on the same Hive megastore, but produces faster and more
reliable results. BigSQL is not just about performance, but also assuring that the SQL query will run. It optimizes
2/3
SQL for MapReduce so that it will run faster and prevent having to modify the Cognos Framework Manager model
or hand code SQL inside of Cognos. An alternative to Hive and BigSQL is Impala, which makes similar claims to
performance.
Success with Big Data requires getting key pieces to work together. With BigInsights and BigSQL, IBM is providing
tools for facilitating Hadoop adoption, including interoperability with existing Cognos infrastructure and functionality.
Stay on top of business intelligence topics, read other Senturus blogs at: http://www.senturus.com/blog/.
Resources
Senturus webinar Running Cognos on Hadoop:
http://www.senturus.com/resources/running-cognos-on-hadoop/
Video of Hive and BigSQL performance test results:
https://developer.ibm.com/hadoop/blog/2015/10/23/hive-and-big-sql-performance-test-update/
IBM BigSQL technology sandbox demo cloud environment for Hadoop and BigSQL:
https://my.imdemocloud.com/projects/3467
Thanks to David Currie for contributing this article. David is a long-time business analytics consultant. He blogs
about business intelligence and big data at davidpcurrie.com.
Big Data / IBM Cognos
3/3

The Big Picture on Big Data and Cognos

  • 1.
    The Big Pictureon Big Data and Cognos www.senturus.com/blog/big-picture-big-data-cognos/ August 1, 2016 Business Strategy & Perspectives IBM has a long history of supporting major open source projects and the most widely adopted open standards. Their enterprise customers have benefited from the flexibility, choice, and innovation that come with the open source philosophy. Major projects include SOA (Service-Oriented Architecture), Linux, Eclipse, and now Hadoop. The big data analytics open source offering is known as the IBM Open Platform with Apache Hadoop. The commercial side of this platform, announced in early 2015, is a suite of products for the enterprise branded as BigInsights. To better understand IBM's big data offerings around Hadoop and its open data platform, it is helpful to put this in context of the overall vision for the platform and the three phases of the IBM Big Data Analytics lifecycle: 1. Pull in all types of data from disparate sources 2. Put the data into a business context 3. Produce intelligent, data driven business outcomes, for example, operational efficiency, customer engagement, or risk management IBM endeavors to cover a lot of business territory with its analytics platform. For the enterprise IT department, the technology enables data integration, governance, security, and regulatory compliance. For line of business managers, the analytics environment is the home of customer and operational intelligence. While analytics play an important role in increasing operational efficiency and eliminating business process bottlenecks, it is the customer- centric analytics that have captured the imagination of business executives. Big data analytics offers many opportunities for improving customer relationships and increasing engagement across marketing channels. A common big data use case is delivering relevant promotions to customers. We all share the experience of receiving credit card offers in the mail from the bank and tossing the envelope directly into the recycling bin without even thinking about it. Despite the dismal response rate, it was cost effective for the bank to send the same direct mail piece to everyone. With a big data platform, it is possible to develop customer profiles and create targeted offers for each segment. For example, customers that have a single account and a short customer history would be candidates for a different array of promotions than someone who has been a customer for decades. The cost of amassing enough data and having the processing power to crunch the numbers in a timely fashion has dropped 1/3
  • 2.
    enough to makeit profitable to do so. With digital advertising and social media data, analysis is required on huge amounts of unstructured data. A couple of years ago this was experimental at best, but now Hadoop software enables capturing and processing unprecedented amounts of data. It complements the enterprise data warehouse and is an integral part of the business intelligence ecosystem. Open Data Platform ODPi The ODPi open data platform is a consortium of IBM and 18 other enterprise software vendors working together to maximize the adoption of technologies based on Apache Hadoop. The goal of ODPi is to accelerate software development by providing a standard Hadoop solution on which an applications can be run, whether it is commercial software, open source, or custom code developed in-house. This gives enterprise customers assurance that they are not locking themselves into a single vendor's Hadoop solution. It also permits using a Hadoop implementation with products from multiple vendors. For Hadoop to fulfill its role as an enterprise data source, it must accommodate a broad audience who will be using many different applications. To that end, the ODPi provides a core platform of agreed on and tested big data Apache Hadoop modules. This is the ODPi standard, on which the vendors build their applications. For example, Hortonworks, IBM Open Platform 4.0 with Apache Hadoop, EMC Pivotal HD 3.0, and Infosys IIP all adhere to the ODPi standard. Analytics software vendors or in-house development shops can concentrate on developing applications further up the stack, knowing that the Hadoop core adheres to a standard and its application will interoperate with any compliant Hadoop system. This accelerates development, promotes code re-use, and simplifies the technical architecture. Implementing a Hadoop distribution that adheres to the ODPi standard means not being locked into a proprietary technology. As a standard, only time will tell if the ODPi will have a lasting impact. The organization has been criticized as being nothing more than a joint marketing effort for vendors pushing their own commercial flavor of Hadoop. Also to note are the big data vendors who are conspicuous by their absence: Cloudera, MapR, and Amazon (AWS – EMR Elastic MapReduce). IBM BigInsights and Cognos On top of Hadoop, IBM has developed a suite of big data and analytics tools under the BigInsights brand. There are tools for scaling and managing the platform (BigInsights Enterprise Management), a machine learning engine (BigInsights Data Scientist – Decision Trees, PageRank, Clustering) and a data exploration and discovery tool (BigSheets). Of particular interest to Cognos customers is BigSQL which runs SQL queries against Hadoop or in other words, BigSQL permits Cognos to use Hadoop as a data source. This is interesting as data stored in Hadoop only becomes useful when it is put into a business context. Cognos Analytics (V11) is well suited for this role. It is a powerful tool for BI developers and business power users, enabling the presentation of Hadoop data in a visually appealing format for executives, managers, and line of business staffers. Big data becomes much more valuable when it can be interpreted and understood by non-technical users. Cognos supports connecting to Hadoop using Hive, which translates code from SQL to MapReduce to get results from Hadoop. There will always be some latency as Hive cannot change the nature of MapReduce, which distributes processing work across Hadoop nodes. The query is split into discrete chunks of work and the results are assembled as they are returned. SQL join conditions, which are commonplace in Cognos generated SQL, create an additional layer of complexity for MapReduce. This further increases the query processing time and will prevent some queries from running at all. IBM addresses these problems with BigSQL. It works on the same Hive megastore, but produces faster and more reliable results. BigSQL is not just about performance, but also assuring that the SQL query will run. It optimizes 2/3
  • 3.
    SQL for MapReduceso that it will run faster and prevent having to modify the Cognos Framework Manager model or hand code SQL inside of Cognos. An alternative to Hive and BigSQL is Impala, which makes similar claims to performance. Success with Big Data requires getting key pieces to work together. With BigInsights and BigSQL, IBM is providing tools for facilitating Hadoop adoption, including interoperability with existing Cognos infrastructure and functionality. Stay on top of business intelligence topics, read other Senturus blogs at: http://www.senturus.com/blog/. Resources Senturus webinar Running Cognos on Hadoop: http://www.senturus.com/resources/running-cognos-on-hadoop/ Video of Hive and BigSQL performance test results: https://developer.ibm.com/hadoop/blog/2015/10/23/hive-and-big-sql-performance-test-update/ IBM BigSQL technology sandbox demo cloud environment for Hadoop and BigSQL: https://my.imdemocloud.com/projects/3467 Thanks to David Currie for contributing this article. David is a long-time business analytics consultant. He blogs about business intelligence and big data at davidpcurrie.com. Big Data / IBM Cognos 3/3