User can run queries via MicroStrategy’s visual interface without the need to write unfamiliar HiveQL or MapReduce scripts. In essence, any user, without programming skill in Hadoop, can ask questions against vast volumes of structured and unstructured data to gain valuable business insights.
MicroStrategy abstracted the SAP HANA data schema, along with other data warehouses and multi-dimensional sources, into one unified system of record, hiding the underlying complexity from end users.
Teradata specializes in storing and analyzing structured, relational data. It has recently purchased Aster Data Systems, Inc. in order to extend its platform to include the capability of handling what is often called ‘big’, ‘semi-structured’ or multi-structured (see below) data.
Here is a case study that I developed to explain the different sets of functionality with the Pentaho Suite. I focused on the functionality, features, illustrative tools and key strengths. I've provided an understanding toward evaluating BI tools when selecting vendors. Enjoy!
MicroStrategy abstracted the SAP HANA data schema, along with other data warehouses and multi-dimensional sources, into one unified system of record, hiding the underlying complexity from end users.
Teradata specializes in storing and analyzing structured, relational data. It has recently purchased Aster Data Systems, Inc. in order to extend its platform to include the capability of handling what is often called ‘big’, ‘semi-structured’ or multi-structured (see below) data.
Here is a case study that I developed to explain the different sets of functionality with the Pentaho Suite. I focused on the functionality, features, illustrative tools and key strengths. I've provided an understanding toward evaluating BI tools when selecting vendors. Enjoy!
A world's one of the first complete Online Web-based Development Frameworks to develop and deploy Decision Support Systems, Knowledge-based systems, Web-sites and Applications backed by Expert System, Case-Based Reasoning and Hybrid AI Technologies
AWS Webcast - Sales Productivity Solutions with MicroStrategy and RedshiftAmazon Web Services
Sales Force Automation (SFA) and Customer Relationship Management (CRM) tools, such as Salesforce.com and Microsoft Dynamics CRM, are ubiquitous tools that provide all of the transactional capabilities required to manage a company's sales pipeline. SFA and CRM data alone, however, is limited and so combining it with information from other sources enables you to create unique and powerful insights. When combined with product and financial data, for example, get visibility into relationships between geographies, sales reps, product performance, and revenue to ultimately optimize profits. Layer on advanced analytic to make predictions about future product sales based on seasonality and other market conditions. To unleash the full power of the CRM and dramatically increase operational performance and top-line revenue, companies are leveraging advanced analytic and data visualization to deliver new insights to the entire sales organization. Moreover, delivering these sales enablement productivity solutions on mobile devices, ensures strong adoption across every sales team. Join us in this webinar to learn how to use MicroStrategy together with Amazon Redshift to build mobile sales productivity solutions for your business.
Analyst View of Data Virtualization: Conversations with Boulder Business Inte...Denodo
In this presentation, executives from Denodo preview the new Denodo Platform 6.0 release that delivers Dynamic Query Optimizer, cloud offering on Amazon Web Services, and self-service data discovery and search. Over 30 analysts, led by Claudia Imhoff, provide input on strategic direction and benefits of Denodo 6.0 to the data virtualization and the broader data integration market.
This presentation is part of the Fast Data Strategy Conference, and you can watch the video here goo.gl/DR6r3m.
As a follow-on to the presentation "Building an Effective Data Warehouse Architecture", this presentation will explain exactly what Big Data is and its benefits, including use cases. We will discuss how Hadoop, the cloud and massively parallel processing (MPP) is changing the way data warehouses are being built. We will talk about hybrid architectures that combine on-premise data with data in the cloud as well as relational data and non-relational (unstructured) data. We will look at the benefits of MPP over SMP and how to integrate data from Internet of Things (IoT) devices. You will learn what a modern data warehouse should look like and how the role of a Data Lake and Hadoop fit in. In the end you will have guidance on the best solution for your data warehouse going forward.
Cognos Data Module Architectures & Use CasesSenturus
Demos of Cognos data module architectures, real-world data module use cases, concepts of data set libraries and current data module gaps as compared to Framework Manager and other modeling use cases. View our on-demand webinar and download this deck at: https://senturus.com/resources/cognos-data-module-architectures-and-use-cases/.
Senturus offers a full spectrum of services across the BI stack plus training on Power BI, Cognos and Tableau. Our resource library has hundreds of free live and recorded webinars, blog posts, demos and unbiased product reviews available on our website at: http://www.senturus.com/senturus-resources/.
Enabling a Data Mesh Architecture with Data VirtualizationDenodo
Watch full webinar here: https://bit.ly/3rwWhyv
The Data Mesh architectural design was first proposed in 2019 by Zhamak Dehghani, principal technology consultant at Thoughtworks, a technology company that is closely associated with the development of distributed agile methodology. A data mesh is a distributed, de-centralized data infrastructure in which multiple autonomous domains manage and expose their own data, called “data products,” to the rest of the organization.
Organizations leverage data mesh architecture when they experience shortcomings in highly centralized architectures, such as the lack domain-specific expertise in data teams, the inflexibility of centralized data repositories in meeting the specific needs of different departments within large organizations, and the slow nature of centralized data infrastructures in provisioning data and responding to changes.
In this session, Pablo Alvarez, Global Director of Product Management at Denodo, explains how data virtualization is your best bet for implementing an effective data mesh architecture.
You will learn:
- How data mesh architecture not only enables better performance and agility, but also self-service data access
- The requirements for “data products” in the data mesh world, and how data virtualization supports them
- How data virtualization enables domains in a data mesh to be truly autonomous
- Why a data lake is not automatically a data mesh
- How to implement a simple, functional data mesh architecture using data virtualization
Big data insights with Red Hat JBoss Data VirtualizationKenneth Peeples
You’re hearing a lot about big data these days. And big data and the technologies that store and process it, like Hadoop, aren’t just new data silos. You might be looking to integrate big data with existing enterprise information systems to gain better understanding of your business. You want to take informed action.
During this session, we’ll demonstrate how Red Hat JBoss Data Virtualization can integrate with Hadoop through Hive and provide users easy access to data. You’ll learn how Red Hat JBoss Data Virtualization:
Can help you integrate your existing and growing data infrastructure.
Integrates big data with your existing enterprise data infrastructure.
Lets non-technical users access big data result sets.
We’ll also provide typical uses cases and examples and a demonstration of the integration of Hadoop sentiment analysis with sales data.
Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enter...Cloudera, Inc.
Recent research has pointed out the complementary nature of Hadoop and other data management solutions and the importance of leveraging existing systems, SQL, engineering, and operational skills, as well as incorporating novel uses of MapReduce to improve analytic processing. Come to this session to learn how companies optimize the use of Hadoop with other enterprise systems to improve overall analytical throughput and build new data-driven products. This session covers: ways to achieve high-performance integration between Hadoop and relational-based systems; Hadoop+NoSQL vs Hadoop+SQL architectures; high-speed, massively parallel data transfer to analytical platforms that can aggregate web log data with granular fact data; and strategies for freeing up capacity for more explorative, iterative analytics and ad hoc queries.
This white paper will present the opportunities laid down by
data lake and advanced analytics, as well as, the challenges
in integrating, mining and analyzing the data collected from
these sources. It goes over the important characteristics of
the data lake architecture and Data and Analytics as a
Service (DAaaS) model. It also delves into the features of a
successful data lake and its optimal designing. It goes over
data, applications, and analytics that are strung together to
speed-up the insight brewing process for industry’s
improvements with the help of a powerful architecture for
mining and analyzing unstructured data – data lake.
Teradata Aster: Big Data Discovery Made Easy
Brad Elo, VP, Aster Data, Teradata
ANALYTICS AND VISUALIZATION FOR THE FINANCIAL ENTERPRISE CONFERENCE
June 25, 2013 The Langham Hotel Boston, MA
Enabling Data as a Service with the JBoss Enterprise Data Services Platformprajods
This presentation was given at JUDCon 2013, Jan 17,18 at Bangalore. Presented by Prajod Vettiyattil and Gnanaguru Sattanathan. The presentation deals with the Why, What and How of Data Services and Data Services Platforms. It also explains the features of the JBoss Enterprise Data Services Platform.
The need for Data Services is explained with 3 Business use cases:
1. Post purchase customer experience improvement for an Auto manufacturer
2. Enterprise Data Access Layer
3. Data Services for Regulatory Reporting requirements like Dodd Frank
xRM is the natural evolution of CRM. Businesses are expanding their use of new generation CRM solutions to manage a wider range of scenarios, including asset management, prospect management, citizen management, and many more. Microsoft CRM sits on the .NET platform and because of that, it is much more than a traditional CRM product. Instead, think of Microsoft CRM is as a rapid development application with out of the box CRM functionality. The purpose of this session is to understand Microsoft's CRM strategy and how you get to market first with world class business solutions.
A world's one of the first complete Online Web-based Development Frameworks to develop and deploy Decision Support Systems, Knowledge-based systems, Web-sites and Applications backed by Expert System, Case-Based Reasoning and Hybrid AI Technologies
AWS Webcast - Sales Productivity Solutions with MicroStrategy and RedshiftAmazon Web Services
Sales Force Automation (SFA) and Customer Relationship Management (CRM) tools, such as Salesforce.com and Microsoft Dynamics CRM, are ubiquitous tools that provide all of the transactional capabilities required to manage a company's sales pipeline. SFA and CRM data alone, however, is limited and so combining it with information from other sources enables you to create unique and powerful insights. When combined with product and financial data, for example, get visibility into relationships between geographies, sales reps, product performance, and revenue to ultimately optimize profits. Layer on advanced analytic to make predictions about future product sales based on seasonality and other market conditions. To unleash the full power of the CRM and dramatically increase operational performance and top-line revenue, companies are leveraging advanced analytic and data visualization to deliver new insights to the entire sales organization. Moreover, delivering these sales enablement productivity solutions on mobile devices, ensures strong adoption across every sales team. Join us in this webinar to learn how to use MicroStrategy together with Amazon Redshift to build mobile sales productivity solutions for your business.
Analyst View of Data Virtualization: Conversations with Boulder Business Inte...Denodo
In this presentation, executives from Denodo preview the new Denodo Platform 6.0 release that delivers Dynamic Query Optimizer, cloud offering on Amazon Web Services, and self-service data discovery and search. Over 30 analysts, led by Claudia Imhoff, provide input on strategic direction and benefits of Denodo 6.0 to the data virtualization and the broader data integration market.
This presentation is part of the Fast Data Strategy Conference, and you can watch the video here goo.gl/DR6r3m.
As a follow-on to the presentation "Building an Effective Data Warehouse Architecture", this presentation will explain exactly what Big Data is and its benefits, including use cases. We will discuss how Hadoop, the cloud and massively parallel processing (MPP) is changing the way data warehouses are being built. We will talk about hybrid architectures that combine on-premise data with data in the cloud as well as relational data and non-relational (unstructured) data. We will look at the benefits of MPP over SMP and how to integrate data from Internet of Things (IoT) devices. You will learn what a modern data warehouse should look like and how the role of a Data Lake and Hadoop fit in. In the end you will have guidance on the best solution for your data warehouse going forward.
Cognos Data Module Architectures & Use CasesSenturus
Demos of Cognos data module architectures, real-world data module use cases, concepts of data set libraries and current data module gaps as compared to Framework Manager and other modeling use cases. View our on-demand webinar and download this deck at: https://senturus.com/resources/cognos-data-module-architectures-and-use-cases/.
Senturus offers a full spectrum of services across the BI stack plus training on Power BI, Cognos and Tableau. Our resource library has hundreds of free live and recorded webinars, blog posts, demos and unbiased product reviews available on our website at: http://www.senturus.com/senturus-resources/.
Enabling a Data Mesh Architecture with Data VirtualizationDenodo
Watch full webinar here: https://bit.ly/3rwWhyv
The Data Mesh architectural design was first proposed in 2019 by Zhamak Dehghani, principal technology consultant at Thoughtworks, a technology company that is closely associated with the development of distributed agile methodology. A data mesh is a distributed, de-centralized data infrastructure in which multiple autonomous domains manage and expose their own data, called “data products,” to the rest of the organization.
Organizations leverage data mesh architecture when they experience shortcomings in highly centralized architectures, such as the lack domain-specific expertise in data teams, the inflexibility of centralized data repositories in meeting the specific needs of different departments within large organizations, and the slow nature of centralized data infrastructures in provisioning data and responding to changes.
In this session, Pablo Alvarez, Global Director of Product Management at Denodo, explains how data virtualization is your best bet for implementing an effective data mesh architecture.
You will learn:
- How data mesh architecture not only enables better performance and agility, but also self-service data access
- The requirements for “data products” in the data mesh world, and how data virtualization supports them
- How data virtualization enables domains in a data mesh to be truly autonomous
- Why a data lake is not automatically a data mesh
- How to implement a simple, functional data mesh architecture using data virtualization
Big data insights with Red Hat JBoss Data VirtualizationKenneth Peeples
You’re hearing a lot about big data these days. And big data and the technologies that store and process it, like Hadoop, aren’t just new data silos. You might be looking to integrate big data with existing enterprise information systems to gain better understanding of your business. You want to take informed action.
During this session, we’ll demonstrate how Red Hat JBoss Data Virtualization can integrate with Hadoop through Hive and provide users easy access to data. You’ll learn how Red Hat JBoss Data Virtualization:
Can help you integrate your existing and growing data infrastructure.
Integrates big data with your existing enterprise data infrastructure.
Lets non-technical users access big data result sets.
We’ll also provide typical uses cases and examples and a demonstration of the integration of Hadoop sentiment analysis with sales data.
Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enter...Cloudera, Inc.
Recent research has pointed out the complementary nature of Hadoop and other data management solutions and the importance of leveraging existing systems, SQL, engineering, and operational skills, as well as incorporating novel uses of MapReduce to improve analytic processing. Come to this session to learn how companies optimize the use of Hadoop with other enterprise systems to improve overall analytical throughput and build new data-driven products. This session covers: ways to achieve high-performance integration between Hadoop and relational-based systems; Hadoop+NoSQL vs Hadoop+SQL architectures; high-speed, massively parallel data transfer to analytical platforms that can aggregate web log data with granular fact data; and strategies for freeing up capacity for more explorative, iterative analytics and ad hoc queries.
This white paper will present the opportunities laid down by
data lake and advanced analytics, as well as, the challenges
in integrating, mining and analyzing the data collected from
these sources. It goes over the important characteristics of
the data lake architecture and Data and Analytics as a
Service (DAaaS) model. It also delves into the features of a
successful data lake and its optimal designing. It goes over
data, applications, and analytics that are strung together to
speed-up the insight brewing process for industry’s
improvements with the help of a powerful architecture for
mining and analyzing unstructured data – data lake.
Teradata Aster: Big Data Discovery Made Easy
Brad Elo, VP, Aster Data, Teradata
ANALYTICS AND VISUALIZATION FOR THE FINANCIAL ENTERPRISE CONFERENCE
June 25, 2013 The Langham Hotel Boston, MA
Enabling Data as a Service with the JBoss Enterprise Data Services Platformprajods
This presentation was given at JUDCon 2013, Jan 17,18 at Bangalore. Presented by Prajod Vettiyattil and Gnanaguru Sattanathan. The presentation deals with the Why, What and How of Data Services and Data Services Platforms. It also explains the features of the JBoss Enterprise Data Services Platform.
The need for Data Services is explained with 3 Business use cases:
1. Post purchase customer experience improvement for an Auto manufacturer
2. Enterprise Data Access Layer
3. Data Services for Regulatory Reporting requirements like Dodd Frank
xRM is the natural evolution of CRM. Businesses are expanding their use of new generation CRM solutions to manage a wider range of scenarios, including asset management, prospect management, citizen management, and many more. Microsoft CRM sits on the .NET platform and because of that, it is much more than a traditional CRM product. Instead, think of Microsoft CRM is as a rapid development application with out of the box CRM functionality. The purpose of this session is to understand Microsoft's CRM strategy and how you get to market first with world class business solutions.
Making Data Visualization & Analytics accessible to Business UsersHaroen Vermylen
Cumul.io's goal is to bring Data Visualization & Analytics into the hands of marketing analysts, sales analysts and other business people who love numbers and decisions, but who don't like IT.
It's time to let the Business Analyst into the Data Lab.
Watch our talk at Data Innovation Summit 2016 at AXA Bank, Brussels.
SharePoint Troubleshooting Tools & TechniquesManuel Longo
Learn about the tools and techniques that Microsoft Premier Support engineers use to gather data to troubleshoot and resolve issues. This session includes an overview of the troubleshooting process used to complete a Root Cause Analysis, and a review and demo of the different set of tools available for different needs including:
-- Diagnostic Logging
-- Data Collection
-- Data Analysis
-- Debugging
2. Google Analytics New Interface - Search University 3Semetis
Google Analytics Today: Timo Josten is the European Google Analytics partners responsible. He will show all new Google Analytics interfaces and explain the rational for new dashboards, tips and tricks about measuring metrics in all online advertising campaigns including Search Marketing. He will also talk about interesting new beta's.
I was meaning to put this talk up for grabs for some time now, but kept forgetting. I was invited to give the keynote speech for the Microstrategy World 2008 conference. The talk was very well received, so here it is.
MicroStrategy World 2014: Scaling MicroStrategy at eBayTim Case
eBay has one of the largest data warehouses in the world! See how the BI Platform team at eBay had to rethink and rebuild their system architecture and processes in order to support the ever-growing data volume and scalability needs of their developers and users.
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...Hortonworks
There certainly is no shortage of hype when it comes to the term “Big Data”. One thing we can be sure of is that massive data volumes are driving a new modern data architecture that includes Hadoop in the mix. But what does that architecture look like for Business Intelligence Data Strategy?
Join Hortonworks and MicroStrategy, where we’ll:
• Discuss the modern architecture for Business Intelligence on top of Hadoop as a data source.
• Learn how our joint solution helps enterprises store, process and analyze vast amounts of structured and unstructured data to deliver business insights throughout an organization.
• Discover what new benefits Hadoop 2.0 offers and how the MicroStrategy Analytics platform leverages those new features to improve performance, achieve faster access times, and allow for true interactive visual data discovery.
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5Cloudera, Inc.
Inefficient data workloads are all too common across enterprises - causing costly delays, breakages, hard-to-maintain complexity, and ultimately lost productivity. For a typical enterprise with multiple data warehouses, thousands of reports, and hundreds of thousands of ETL jobs being executed every day, this loss of productivity is a real problem. Add to all of this the complex handwritten SQL queries, and there can be nearly a million queries executed every month that desperately need to be optimized, especially to take advantage of the benefits of Apache Hadoop. How can enterprises dig through their workloads and inefficiencies to easily see which are the best fit for Hadoop and what’s the fastest path to get there?
Cloudera Navigator Optimizer is the solution - analyzing existing SQL workloads to provide instant insights into your workloads and turns that into an intelligent optimization strategy so you can unlock peak performance and efficiency with Hadoop. As the newest addition to Cloudera’s enterprise Hadoop platform, and now available in limited beta, Navigator Optimizer has helped customers profile over 1.5 million queries and ultimately save millions by optimizing for Hadoop.
Business Intelligence made easy! This is the first part of a two-part presentation I prepared for one of our customers to help them understand what Business Intelligence is and what can it do...
Big data is data that, by virtue of its velocity, volume, or variety (the three Vs), cannot be easily stored or analyzed with traditional methods. Hadoop is an open-source software framework for storing data and running applications on clusters of commodity hardware.
The strategic relationship between Hortonworks and SAP enables SAP to resell Hortonworks Data Platform (HDP) and provide enterprise support for their global customer base. This means SAP customers can incorporate enterprise Hadoop as a complement within a data architecture that includes SAP HANA, Sybase and SAP BusinessObjects enabling a broad range of new analytic applications.
The data management industry has matured over the last three decades, primarily based on relational database management system(RDBMS) technology. Since the amount of data collected, and analyzed in enterprises has increased several folds in volume, variety and velocityof generation and consumption, organisations have started struggling with architectural limitations of traditional RDBMS architecture. As a result a new class of systems had to be designed and implemented, giving rise to the new phenomenon of “Big Data”. In this paper we will trace the origin of new class of system called Hadoop to handle Big data.
Infrastructure Considerations for Analytical WorkloadsCognizant
Using Apache Hadoop clusters and Mahout for analyzing big data workloads yields extraordinary performance; we offer a detailed comparison of running Hadoop in a physical vs. virtual infrastructure environment.
A short overview of Bigdata along with its popularity, ups and downs from past to present. We had a look of its needs, challenges and risks too. Architectures involved in it. Vendors associated with it.
Big data: Descoberta de conhecimento em ambientes de big data e computação na...Rio Info
Palestra sobre Big data: Descoberta de conhecimento em ambientes de big data e computação na nuvem apresentada por Nelson Favilla durante o Rio Info 2014
Analysis of historical movie data by BHADRABhadra Gowdra
Recommendation system provides the facility to understand a person's taste and find new, desirable content for them automatically based on the pattern between their likes and rating of different items. In this paper, we have proposed a recommendation system for the large amount of data available on the web in the form of ratings, reviews, opinions, complaints, remarks, feedback, and comments about any item (product, event, individual and services) using Hadoop Framework.
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
StarCompliance is a leading firm specializing in the recovery of stolen cryptocurrency. Our comprehensive services are designed to assist individuals and organizations in navigating the complex process of fraud reporting, investigation, and fund recovery. We combine cutting-edge technology with expert legal support to provide a robust solution for victims of crypto theft.
Our Services Include:
Reporting to Tracking Authorities:
We immediately notify all relevant centralized exchanges (CEX), decentralized exchanges (DEX), and wallet providers about the stolen cryptocurrency. This ensures that the stolen assets are flagged as scam transactions, making it impossible for the thief to use them.
Assistance with Filing Police Reports:
We guide you through the process of filing a valid police report. Our support team provides detailed instructions on which police department to contact and helps you complete the necessary paperwork within the critical 72-hour window.
Launching the Refund Process:
Our team of experienced lawyers can initiate lawsuits on your behalf and represent you in various jurisdictions around the world. They work diligently to recover your stolen funds and ensure that justice is served.
At StarCompliance, we understand the urgency and stress involved in dealing with cryptocurrency theft. Our dedicated team works quickly and efficiently to provide you with the support and expertise needed to recover your assets. Trust us to be your partner in navigating the complexities of the crypto world and safeguarding your investments.
2. Hadoop
Hadoop is a free, Java-based programming framework that
supports the processing of large data sets in a distributed
computing environment.
It makes it possible to run applications on systems with
thousands of nodes involving thousands of terabytes.
Its distributed file system facilitates rapid data transfer rates
among nodes and allows the system to continue operating
uninterrupted in case of a node failure.
This approach lowers the risk of catastrophic system failure,
even if a significant number of nodes become inoperative.
3. Why Hadoop?
Scalibility
Simply scales just by adding nodes.
Local processing to avoid network bottlenecks.
• Flexibility
All kinds of data.(blobs,documents,records etc).
In all forms(structured,semi-structured,structured)
Store anything and later analyze what you need.
• Efficiency
Cost efficiency(<1$kb/Tb) on commodity hardware.
Unified storage,metadata,security(no duplication or
synchronization)
4. Core parts of Hadoop
Hadoop Distributed File System(HDFS)
It is the primary storage system used by Hadoop applications.
HDFS is a distributed file system that provides high-performance access
to data across Hadoop clusters. Like other Hadoop-related technologies,
HDFS has become a key tool for managing pools of big data and
supporting big data analytics applications.
When HDFS takes in data, it breaks the information down into separate
pieces and distributes them to different nodes in a cluster, allowing
for parallel processing. The file system also copies each piece of data
multiple times and distributes the copies to individual nodes, placing at least
one copy on a different server rack than the others. As a result, the data on
nodes that crash can be found elsewhere within a cluster, which allows
processing to continue while the failure is resolved.
HDFS is built to support applications with large data sets, including
individual files that reach into the terabytes. It uses a master/slave
architecture, with each cluster consisting of a single NameNode that
manages file system operations and supporting DataNodes that manage data
storage on individual compute nodes.
5. MapReduce
A MapReduce program is composed of a Map() procedure that performs
filtering and sorting (such as sorting students by first name into queues, one
queue for each name) and a Reduce() procedure that performs a summary
operation (such as counting the number of students in each queue, yielding
name frequencies).
The "MapReduce System" (also called "infrastructure" or "framework")
orchestrates by marshalling the distributed servers, running the various tasks
in parallel, managing all communications and data transfers between the
various parts of the system, and providing for redundancy and fault tolerance.
HDFS and MapReduce are robust. Servers in a Hadoop cluster can fail and
not abort the computation process. HDFS ensures data is replicated with
redundancy across the cluster. On completion of a calculation, a node will
write its results back into HDFS.
6. MicroStrategy Integration
Cloudera and MicroStrategy have collaborated to develop a powerful and
easy-to-use BI framework for Apache Hadoop by creating a connection
between MicroStrategy 9 and CDH. This connection is established via an
Open Database Connectivity (ODBC) driver for Apache Hive and is available
as the Cloudera Connector for MicroStrategy.
The connector allows business users to perform sophisticated point and click
analytics on data stored in Hadoop directly from MicroStrategy applications –
just as they do on data stored in data warehouses, data marts and operational
databases. MicroStrategy has developed Very Large Database Drivers
(VLDB) specifically for Cloudera that generate optimized queries for
Cloudera's Distribution including Apache Hadoop.
7. The Cloudera Connector for MicroStrategy enables your enterprise users to
access Hadoop data through the Business Intelligence application
MicroStrategy 9.3.1. The driver achieves this by translating Open Database
Connectivity (ODBC) calls from MicroStrategy into SQL and passing the
SQL queries to the underlying Impala or Hive engines.
MSTR and Cloudera together offer a connector that empowers organizations
to extract and deliver valuable insights from massive volumes of structured
and unstructured data. By providing sophisticated yet familiar reporting and
analysis tools on top of Apache Hadoop, business users can quickly and
easily unlock the potential of their data to make better business decisions.
8. What’s Impala
Interactive SQL
Typically 100x faster than Hive.
Responses in sub-seconds.
Nearly ANSI-92 standard SQL queries with Hive SQL
Compatible SQL interfaces for existing Hadoop/CDH applications.
Based on industry standard SQL.
Natively on Hadoop/Hbase storage and metadata
Flexibility,scale and cost advantages of Hadoop.
No duplication/synchronization of data and metadata.
Local processing to avoid network bottlenecks.
Separate runtime on MapReduce
Mapreduce is designed and great for batch.
Impala is purpose-built for low latency SQL queries on Hadoop.
9. Benefits of Impala
More and faster value from “Big Data”
BI tools impractical on Hadoop before Impala
Move from 10s of Hadoop users per cluster to 100s of SQL users.
No delays from data migration
Flexibility
Query across existing data.
Select best-fit file formats.
Run multiple frameworks on the same data at the same time.
Cost Analysis
Reduce movement,duplicate storage & compute.
10% to 1% the cost of analytic DBMS.
Full Fidelity analysis
No loss from aggregations or fixed schemas.
10. Project
Integrating Hadoop-Impala with Microstrategy reporting
capabilities we developed Healthcare Management software.
We used data stored in HDFS and Impala as Native MPP query
engine integrated in Hadoop via connector.
Based on our requirements we made Intelligent Cubes and
directly exported to MicroStrategy.
Using data insight visualization capabilities we are able to display
visually appealing dashboards and insightful reports.
We have developed 3 dashboards displaying various ways of
visualizing HealthCare Management data.
12. Key Performance Indicator displays the total number of
issuers,employes,employers,brokers and enrollments.
It also displays aggregated calculation of employee
income,premium/month and percentage.
Service area displays US-statewise information of total count
using image layout widget.
Enrollment displays heatmap of total enrollment count
corresponding to each US state.
Employee segmentation displays grid graph display of
number of employes per segments.
14. In the Ticketing dashboard,Overall Ticket Workload section
displays information about total count of support persons,open
tickets,average response days and backlog percentage.
Open Tickets section describes waterfall widget describing total
open counts as per the issuer-type.
It contains heatmap corresponding to average closure time and
ticket issuertype.
It contains gauge widgets of closure time in days corresponding to
year,quarter,month and week.
It also displays microcharts displaying count of current-status
based on issuertype.In microcharts we used sparkline and bar mode
to anaylse in different ways.
16. It is an interactive dashboard.
Key Performance Indicator displays information about total
service area and enrollment count corresponding to
issuername.
By using issuername as selector it targets heat map of
enrollment displaying information of total enrollments
corresponding to each state.
By using issuername as selector it also targets the US map
image layout widget displaying total service area count
corresponding to each state.
18. Here we took the raw real-time stock data of NASDAQ and NYSE
for analysing as per our requirement.
In the above screenshot there are 4 selectors namely
Sector,Industries,Symbol and Year.
Industry is filtered by Sector selector and Symbol is filtered by
Sector and Industry respectively.
All the 4 selectors will filter data to the below panel displaying
stock volatility by year,quarter,month and week.
Panel describing grid and graph view limiting to 50 data at a time
as shown in below screenshot.
19. Conclusion
User can run queries via MicroStrategy’s visual interface
without the need to write unfamiliar HiveQL or MapReduce
scripts. In essence, any user, without programming skill in
Hadoop, can ask questions against vast volumes of structured
and unstructured data to gain valuable business insights.
It is very fast,scalable,cost effective and resilent to failure.
Hadoop is inefficient for handling small files, and it
lacks transparent compression. As HDFS is not designed
to work well with random reads over small files due to its
optimization.
It is used only for batch-based architecture not for real-time
data access.
Following shared-nothing architecture so task requiring global
synchronization or sharing of mutable data doesnot fit.