Case Study: Big Data Analytics
The client is a US based product company that offers Enterprise Search & Discovery
Platforms based on Big Data technologies. The platform is used by many large enterprises
to transform their most valuable asset - information, to valuable insights.
The client’s product connects with all the diverse information sources available in an
enterprise - email systems, enterprise resource planning systems, customer relationship
management
systems, document management systems like sharepoint and many others in to a single
source. The client has added a new SQL interface which allows their customers to search
via using familiar SQL interfaces.
Challenges
Benchmarking Different Data Sources and Search Platforms
for Today’s Digital Enterprise
The SQL interface to the product allows querying information in the system via
SQL queries. The client was interested in knowing how their product performs
compared to other alternatives which may or may not be similar. The mandate
was to evaluate performance of their product with open source Hadoop+Hive
eco-system using industry standard TPC-H benchmarking system for relational
databases.
The objective was to identify areas of improvement in their product and make
recommendations to their customers.
Benchmarking proprietary Big Data based enterprise search and discovery
platforms with relational databases and open source Big Data systems
Solution
Researched TPC-H benchmarking standard, generated test data
for a specific scale factor
Identified appropriate hardware profile for all the data sources
Added MySQL Server, ran TPC-H benchmark against MySQL
Server for base lining
Built a Hadoop Cluster, added Hive, and using Sqoop imported
data from MySQL and ran the TPC-H benchmark for this system
TenXLabs enhanced our own
ideas just like a consultant.
They were very
thoughtfuland enabled us to
gain valuable outputs. They
allowed us to grab the
competitive advantage in the
digitally transformed world.
TenXLabs assisted us in
performing high level testing
for our different data sources.
TenXLabs took a holistic approach for this benchmarking exercise by
using both relational as well as non relational sources into
consideration and:
+1 267-507-6135 +91 404-646-5532 sales@tenxlabs.com(US) (INDIA)
© TenXLabs Technologies. All rights reserved.
www.tenxlabs.com
BNY Mellon Center, 1735 Market Street, Suite 3750, Philadelphia PA, USA
Block A, IIIT Campus, Gachibowli, Hyderabad - 500 032, INDIA
Researched client’s product, built a cluster, wrote custom
connectors to import data from MySQL and ran TPC-H
benchmark for this system
Built a JDBC/JUnit based Test harness to run TPC-H benchmark
against any given system which allows JDBC
Built a web based tool, to run any ad-hoc SQL queries against
client’s product, Hadoop+Hive and MySQL and collect
performance stats in real time
TenXLabs approach offered the client
repeatable and reliable solution to
benchmark their product.The solution was
both extensible, enabling more data
sources to be added in future and
scalable,wherein data for multiple load
factors was supported by TPC-H.
As a result client was also able to identify
the areas of focus and was pleasantly
surprised to see their product preforming
exceptionally well, in some cases compared
to even MySQL - something they did not
anticipate.
TenXLabs is currently engaged with the
client to enhance as well as expand this
solution.
Benefits

Case Study: Big Data Analytics

  • 1.
    Case Study: BigData Analytics The client is a US based product company that offers Enterprise Search & Discovery Platforms based on Big Data technologies. The platform is used by many large enterprises to transform their most valuable asset - information, to valuable insights. The client’s product connects with all the diverse information sources available in an enterprise - email systems, enterprise resource planning systems, customer relationship management systems, document management systems like sharepoint and many others in to a single source. The client has added a new SQL interface which allows their customers to search via using familiar SQL interfaces. Challenges Benchmarking Different Data Sources and Search Platforms for Today’s Digital Enterprise The SQL interface to the product allows querying information in the system via SQL queries. The client was interested in knowing how their product performs compared to other alternatives which may or may not be similar. The mandate was to evaluate performance of their product with open source Hadoop+Hive eco-system using industry standard TPC-H benchmarking system for relational databases. The objective was to identify areas of improvement in their product and make recommendations to their customers. Benchmarking proprietary Big Data based enterprise search and discovery platforms with relational databases and open source Big Data systems
  • 2.
    Solution Researched TPC-H benchmarkingstandard, generated test data for a specific scale factor Identified appropriate hardware profile for all the data sources Added MySQL Server, ran TPC-H benchmark against MySQL Server for base lining Built a Hadoop Cluster, added Hive, and using Sqoop imported data from MySQL and ran the TPC-H benchmark for this system TenXLabs enhanced our own ideas just like a consultant. They were very thoughtfuland enabled us to gain valuable outputs. They allowed us to grab the competitive advantage in the digitally transformed world. TenXLabs assisted us in performing high level testing for our different data sources. TenXLabs took a holistic approach for this benchmarking exercise by using both relational as well as non relational sources into consideration and: +1 267-507-6135 +91 404-646-5532 sales@tenxlabs.com(US) (INDIA) © TenXLabs Technologies. All rights reserved. www.tenxlabs.com BNY Mellon Center, 1735 Market Street, Suite 3750, Philadelphia PA, USA Block A, IIIT Campus, Gachibowli, Hyderabad - 500 032, INDIA Researched client’s product, built a cluster, wrote custom connectors to import data from MySQL and ran TPC-H benchmark for this system Built a JDBC/JUnit based Test harness to run TPC-H benchmark against any given system which allows JDBC Built a web based tool, to run any ad-hoc SQL queries against client’s product, Hadoop+Hive and MySQL and collect performance stats in real time TenXLabs approach offered the client repeatable and reliable solution to benchmark their product.The solution was both extensible, enabling more data sources to be added in future and scalable,wherein data for multiple load factors was supported by TPC-H. As a result client was also able to identify the areas of focus and was pleasantly surprised to see their product preforming exceptionally well, in some cases compared to even MySQL - something they did not anticipate. TenXLabs is currently engaged with the client to enhance as well as expand this solution. Benefits