SlideShare a Scribd company logo
1 of 14
Apache Trafodion™ (incubating)
Push Hadoop Beyond Analytics
trafodion.apache.org
Speaker: Rao Kakarlamudi (rao.kakarlamudi@esgyn.com)
Use Case
Internet of Things
Business Needs
◦ Enormous vehicle fleet
◦ Real-time capture, monitoring, and analysis at scale
with high concurrency
Problem
◦ Optimize usage
◦ Understand scheduling
◦ Understand maintenance
◦ Real-time customer information
Challenge
◦ 559 million vehicle records per day
◦ Sub-second response time
◦ Sustained performance at >100 concurrent users
Solution
◦ Trafodion on standard x86 Linux cluster
◦ Data load, query, and extract in parallel
◦ Users can query both current and historical data
© 2015 Esgyn Corporation
Use Case
Finance
Business Needs
◦ Customers need to query their recent balances and
their transactions from months or even years ago.
◦ They also want more information than can easily be
stored in a separated architecture story (vendor
name, ATM location, transfer location, etc.)
Problem
◦ Retail business get transactions at a high overall
volume from a wide variety of sources, like credit card
transactions, tellers, electronic transfers, and ATMs.
◦ Customers make queries about individual transactions
in the last day, month, and year, but the storage and
query performance required to give full information
about all transactions is beyond the capacity of
traditional architectures
Challenge
◦ Query data from the current day’s transactions with
high reliability and low latency, without impacting the
performance of the primary transactional system
Solution
◦ EsgynDB initially provides an ODS for the mission-
critical transaction system, offloading near-real-time
queries there to allow the primary transactional
system to meet its SLAs.
◦ The same data lake also includes the historical data,
allowing for seamless connection of data over time,
with no extra data replication. And with EsgynDB’ s
ability to integrate structured, semi-structured, and
unstructured data, customers and employees have
access to more information about each transaction.
© 2015 Esgyn Corporation
Use Case
Telecommunications
Business Needs
◦ 24x7 ingest and analysis of voice, SMS, and data file
business transactions
◦ Build new solutions for 100s of millions of users
Problem
◦ Up-to-date information within few minutes
◦ Support and upsell
◦ Trust your data
Challenge
◦ Load GB of data in minutes on an ongoing basis
◦ Comprehensive queries against historical and recent
data
◦ Data quality and rapid analysis to engage customer
Solution
◦ Trafodion on standard x86 Linux cluster
◦ Ingest raw data at arrival, rate and load into Trafodion
◦ Transactional inquiries
◦ Detail reports
© 2015 Esgyn Corporation
Use Case
E-Commerce
Business Need
◦ Ad-driven revenue model
◦ Need near-real time decisions to optimize ad
placement
Problem
◦ Log files  Hive  Traditional Database
◦ Too slow to meet business requirements
Challenge
◦ 2 TB of data daily, 42 GB/hour peak
◦ Misses critical data + lots of redundant data
◦ High-volume transactions and concurrency
◦ Produce account summaries in hours
Solution
◦ Query Hive data directly; store in Trafodion
◦ Same data lake  no ETL needed
◦ Near-real time data access using SQL
© 2015 Esgyn Corporation
Trafodion Brings:
• Open Source Apache Trafodion (Incubating) project and license
• Hadoop HBase scalability up to petabytes
• Full ANSI SQL support
• ACID Transactions (Atomic, Consistent, Isolated, Durable) across rows, tables, and servers
• Cost effective scale out
• Enterprise ready active-active replication across multiple data centers
• ODBC / JDBC / ADO.Net / Hibernate support
• Proven and hardened database engine with 20+ years of Tandem / Compaq / HP innovation
• Data federation (e.g. Kafka) and schema flexibility
• Optimized for real-time transaction processing, operational reporting, and operational data
store (ODS) workloads that demand sub-second response times with high concurrency
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
Trafodion Stack Overview
– Running Queries
HBase
Native HBase
Tables KVS,
Columnar
Hive
Native Hive Tables
Multi-Structured
ESP
CMP Master
HBase
ESPDTM
Storage
Engine
JDBC ODBC
Compiler and Optimizer
SQL Parallelism
Distributed
Transaction
Management
HDFS
. . . .
User and ISV
Operational
Applications
Database Connectivity
Data Store
Integration
Driver
Relational
Schema
Trafodion
Tables
Client
SQL
© Copyright 2015 Esgyn Corporation Esgyn Confidential
Why Apache Trafodion?
Ingredients for a world-class relational database
1. Time, Money, and Talent
◦ 20+ years of investment
◦ $300+ million invested
◦ Database developers grew up on
◦ Shared nothing Massively Parallel Architecture
◦ With a single system image across clusters
◦ 300+ years of database experience
◦ On building OLTP and BI engines
ANSI and non-ANSI functionality supported,
performance, scalability, concurrency, throughput,
stability, high availability, transactional, UDF, SPJ,
OLAP, etc.
Why Apache Trafodion?
Ingredients for a World-class Relational Database
2. World Class Optimizer
◦ Rule-driven and cost-based optimizer
◦ Based on Cascades & Large Scope Rules
◦ Reduces search space
◦ Recognizes patterns such as star joins
◦ Considers multiple join strategies
◦ Nested and nested cache for operational
◦ Merge and hybrid hash for large complex queries
◦ Optimizes inner, outer, & full outer joins
◦ Considers serial & parallel plans based on
cardinality
◦ Uses equal-height histograms to indicate skew
◦ Leverages skew buster to eliminate skew
◦ Un-nests subqueries
◦ Converts correlated subqueries to joins
◦ Pushes down predicates to lowest operation
◦ Filters e.g. row selection (start-stop key)
◦ Coprocessors e.g. pre-aggregation
◦ Leverages Multi-Dimensional Access (MDAM)
◦ To avoid full scans when no predicates on leading
key columns specified
◦ Considers sort avoidance strategies
◦ Uses hash group by to avoid sorts
◦ Leverages key order
◦ Does in-memory sort when possible
◦ Uses sophisticated plan caching techniques
◦ And a lot more …
Built & tuned to handle complexities &
differences inherent in varied enterprise class
workloads
© Copyright 2015 Esgyn Corporation Esgyn Confidential
Node 1 Node 2 Node n
Client Application
HDFS
HBase HBase HBaseFilters
HDFS HDFS HDFS HDFS
Ethernet
Coprocessors
3. World Class Parallel Data Flow Execution
Engine
◦ Data Flow pipeline parallel architecture
◦ Intermediate results materialized only for blocking
operations like sorts
◦ Data overflow to disk only for large hash joins
◦ Adaptive Segmentation to use only needed resources
◦ Co-located joins & repartitioning when necessary
◦ Uses Inner and outer child broadcasts
◦ Parallel secondary index maintenance
Why Apache Trafodion?
Ingredients for a world-class relational database
Master
ESP ESP ESP ESP ESP
ESP ESP ESP ESP ESP
Master
Multi-
fragment
Supports salting of data across region servers
Why Apache Trafodion?
Ingredients for a World-class Relational Database
4. World Class Distributed Transaction
Management system
© Copyright 2015 Esgyn Corporation Esgyn Confidential
Performance
YCSB and Order Entry scale linearly!
Transactional
Order Entry
Throughput
YCSB
Selects Updates
50/50
Throughput
Throughput
Throughput
Try and Contribute Apache Trafodion
Download:
◦ trafodion.apache.org
Try Trafodion on AWS:
◦ https://aws.amazon.com/marketplace/pp/B018RBMFG0
Documentation:
◦ trafodion.apache.org
Become a contributor – add a new feature, fix a bug, translate documentation, more
◦ Discuss your changes on the dev mailing list
◦ Create a JIRA issue
◦ Setup your development environment
◦ Prepare a patch containing your changes
◦ Submit the patch
Thank
You
Rao Kakarlamudi (rao.kakarlamudi@esgyn.com)

More Related Content

More from Cask Data

Transactions Over Apache HBase
Transactions Over Apache HBaseTransactions Over Apache HBase
Transactions Over Apache HBaseCask Data
 
ACID Transactions in Apache Phoenix with Apache Tephra™ (incubating), by Poor...
ACID Transactions in Apache Phoenix with Apache Tephra™ (incubating), by Poor...ACID Transactions in Apache Phoenix with Apache Tephra™ (incubating), by Poor...
ACID Transactions in Apache Phoenix with Apache Tephra™ (incubating), by Poor...Cask Data
 
Logging infrastructure for Microservices using StreamSets Data Collector
Logging infrastructure for Microservices using StreamSets Data CollectorLogging infrastructure for Microservices using StreamSets Data Collector
Logging infrastructure for Microservices using StreamSets Data CollectorCask Data
 
Introducing Athena: 08/19 Big Data Application Meetup, Talk #3
Introducing Athena: 08/19 Big Data Application Meetup, Talk #3 Introducing Athena: 08/19 Big Data Application Meetup, Talk #3
Introducing Athena: 08/19 Big Data Application Meetup, Talk #3 Cask Data
 
NRT Event Processing with Guaranteed Delivery of HTTP Callbacks, HBaseCon 2015
NRT Event Processing with Guaranteed Delivery of HTTP Callbacks, HBaseCon 2015NRT Event Processing with Guaranteed Delivery of HTTP Callbacks, HBaseCon 2015
NRT Event Processing with Guaranteed Delivery of HTTP Callbacks, HBaseCon 2015Cask Data
 
Brown Bag : CDAP (f.k.a Reactor) Streams Deep DiveStream on file brown bag
Brown Bag : CDAP (f.k.a Reactor) Streams Deep DiveStream on file brown bagBrown Bag : CDAP (f.k.a Reactor) Streams Deep DiveStream on file brown bag
Brown Bag : CDAP (f.k.a Reactor) Streams Deep DiveStream on file brown bagCask Data
 
HBase Meetup @ Cask HQ 09/25
HBase Meetup @ Cask HQ 09/25HBase Meetup @ Cask HQ 09/25
HBase Meetup @ Cask HQ 09/25Cask Data
 

More from Cask Data (7)

Transactions Over Apache HBase
Transactions Over Apache HBaseTransactions Over Apache HBase
Transactions Over Apache HBase
 
ACID Transactions in Apache Phoenix with Apache Tephra™ (incubating), by Poor...
ACID Transactions in Apache Phoenix with Apache Tephra™ (incubating), by Poor...ACID Transactions in Apache Phoenix with Apache Tephra™ (incubating), by Poor...
ACID Transactions in Apache Phoenix with Apache Tephra™ (incubating), by Poor...
 
Logging infrastructure for Microservices using StreamSets Data Collector
Logging infrastructure for Microservices using StreamSets Data CollectorLogging infrastructure for Microservices using StreamSets Data Collector
Logging infrastructure for Microservices using StreamSets Data Collector
 
Introducing Athena: 08/19 Big Data Application Meetup, Talk #3
Introducing Athena: 08/19 Big Data Application Meetup, Talk #3 Introducing Athena: 08/19 Big Data Application Meetup, Talk #3
Introducing Athena: 08/19 Big Data Application Meetup, Talk #3
 
NRT Event Processing with Guaranteed Delivery of HTTP Callbacks, HBaseCon 2015
NRT Event Processing with Guaranteed Delivery of HTTP Callbacks, HBaseCon 2015NRT Event Processing with Guaranteed Delivery of HTTP Callbacks, HBaseCon 2015
NRT Event Processing with Guaranteed Delivery of HTTP Callbacks, HBaseCon 2015
 
Brown Bag : CDAP (f.k.a Reactor) Streams Deep DiveStream on file brown bag
Brown Bag : CDAP (f.k.a Reactor) Streams Deep DiveStream on file brown bagBrown Bag : CDAP (f.k.a Reactor) Streams Deep DiveStream on file brown bag
Brown Bag : CDAP (f.k.a Reactor) Streams Deep DiveStream on file brown bag
 
HBase Meetup @ Cask HQ 09/25
HBase Meetup @ Cask HQ 09/25HBase Meetup @ Cask HQ 09/25
HBase Meetup @ Cask HQ 09/25
 

Recently uploaded

Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 

Recently uploaded (20)

Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 

Introduction EsgynDB, based on Apache Trafodion, by Rao Kakarlamudi, Esgyn

  • 1. Apache Trafodion™ (incubating) Push Hadoop Beyond Analytics trafodion.apache.org Speaker: Rao Kakarlamudi (rao.kakarlamudi@esgyn.com)
  • 2. Use Case Internet of Things Business Needs ◦ Enormous vehicle fleet ◦ Real-time capture, monitoring, and analysis at scale with high concurrency Problem ◦ Optimize usage ◦ Understand scheduling ◦ Understand maintenance ◦ Real-time customer information Challenge ◦ 559 million vehicle records per day ◦ Sub-second response time ◦ Sustained performance at >100 concurrent users Solution ◦ Trafodion on standard x86 Linux cluster ◦ Data load, query, and extract in parallel ◦ Users can query both current and historical data © 2015 Esgyn Corporation
  • 3. Use Case Finance Business Needs ◦ Customers need to query their recent balances and their transactions from months or even years ago. ◦ They also want more information than can easily be stored in a separated architecture story (vendor name, ATM location, transfer location, etc.) Problem ◦ Retail business get transactions at a high overall volume from a wide variety of sources, like credit card transactions, tellers, electronic transfers, and ATMs. ◦ Customers make queries about individual transactions in the last day, month, and year, but the storage and query performance required to give full information about all transactions is beyond the capacity of traditional architectures Challenge ◦ Query data from the current day’s transactions with high reliability and low latency, without impacting the performance of the primary transactional system Solution ◦ EsgynDB initially provides an ODS for the mission- critical transaction system, offloading near-real-time queries there to allow the primary transactional system to meet its SLAs. ◦ The same data lake also includes the historical data, allowing for seamless connection of data over time, with no extra data replication. And with EsgynDB’ s ability to integrate structured, semi-structured, and unstructured data, customers and employees have access to more information about each transaction. © 2015 Esgyn Corporation
  • 4. Use Case Telecommunications Business Needs ◦ 24x7 ingest and analysis of voice, SMS, and data file business transactions ◦ Build new solutions for 100s of millions of users Problem ◦ Up-to-date information within few minutes ◦ Support and upsell ◦ Trust your data Challenge ◦ Load GB of data in minutes on an ongoing basis ◦ Comprehensive queries against historical and recent data ◦ Data quality and rapid analysis to engage customer Solution ◦ Trafodion on standard x86 Linux cluster ◦ Ingest raw data at arrival, rate and load into Trafodion ◦ Transactional inquiries ◦ Detail reports © 2015 Esgyn Corporation
  • 5. Use Case E-Commerce Business Need ◦ Ad-driven revenue model ◦ Need near-real time decisions to optimize ad placement Problem ◦ Log files  Hive  Traditional Database ◦ Too slow to meet business requirements Challenge ◦ 2 TB of data daily, 42 GB/hour peak ◦ Misses critical data + lots of redundant data ◦ High-volume transactions and concurrency ◦ Produce account summaries in hours Solution ◦ Query Hive data directly; store in Trafodion ◦ Same data lake  no ETL needed ◦ Near-real time data access using SQL © 2015 Esgyn Corporation
  • 6. Trafodion Brings: • Open Source Apache Trafodion (Incubating) project and license • Hadoop HBase scalability up to petabytes • Full ANSI SQL support • ACID Transactions (Atomic, Consistent, Isolated, Durable) across rows, tables, and servers • Cost effective scale out • Enterprise ready active-active replication across multiple data centers • ODBC / JDBC / ADO.Net / Hibernate support • Proven and hardened database engine with 20+ years of Tandem / Compaq / HP innovation • Data federation (e.g. Kafka) and schema flexibility • Optimized for real-time transaction processing, operational reporting, and operational data store (ODS) workloads that demand sub-second response times with high concurrency (C) Copyright 2015 Esgyn Corporation Esgyn Confidential
  • 7. Trafodion Stack Overview – Running Queries HBase Native HBase Tables KVS, Columnar Hive Native Hive Tables Multi-Structured ESP CMP Master HBase ESPDTM Storage Engine JDBC ODBC Compiler and Optimizer SQL Parallelism Distributed Transaction Management HDFS . . . . User and ISV Operational Applications Database Connectivity Data Store Integration Driver Relational Schema Trafodion Tables Client SQL © Copyright 2015 Esgyn Corporation Esgyn Confidential
  • 8. Why Apache Trafodion? Ingredients for a world-class relational database 1. Time, Money, and Talent ◦ 20+ years of investment ◦ $300+ million invested ◦ Database developers grew up on ◦ Shared nothing Massively Parallel Architecture ◦ With a single system image across clusters ◦ 300+ years of database experience ◦ On building OLTP and BI engines ANSI and non-ANSI functionality supported, performance, scalability, concurrency, throughput, stability, high availability, transactional, UDF, SPJ, OLAP, etc.
  • 9. Why Apache Trafodion? Ingredients for a World-class Relational Database 2. World Class Optimizer ◦ Rule-driven and cost-based optimizer ◦ Based on Cascades & Large Scope Rules ◦ Reduces search space ◦ Recognizes patterns such as star joins ◦ Considers multiple join strategies ◦ Nested and nested cache for operational ◦ Merge and hybrid hash for large complex queries ◦ Optimizes inner, outer, & full outer joins ◦ Considers serial & parallel plans based on cardinality ◦ Uses equal-height histograms to indicate skew ◦ Leverages skew buster to eliminate skew ◦ Un-nests subqueries ◦ Converts correlated subqueries to joins ◦ Pushes down predicates to lowest operation ◦ Filters e.g. row selection (start-stop key) ◦ Coprocessors e.g. pre-aggregation ◦ Leverages Multi-Dimensional Access (MDAM) ◦ To avoid full scans when no predicates on leading key columns specified ◦ Considers sort avoidance strategies ◦ Uses hash group by to avoid sorts ◦ Leverages key order ◦ Does in-memory sort when possible ◦ Uses sophisticated plan caching techniques ◦ And a lot more … Built & tuned to handle complexities & differences inherent in varied enterprise class workloads © Copyright 2015 Esgyn Corporation Esgyn Confidential
  • 10. Node 1 Node 2 Node n Client Application HDFS HBase HBase HBaseFilters HDFS HDFS HDFS HDFS Ethernet Coprocessors 3. World Class Parallel Data Flow Execution Engine ◦ Data Flow pipeline parallel architecture ◦ Intermediate results materialized only for blocking operations like sorts ◦ Data overflow to disk only for large hash joins ◦ Adaptive Segmentation to use only needed resources ◦ Co-located joins & repartitioning when necessary ◦ Uses Inner and outer child broadcasts ◦ Parallel secondary index maintenance Why Apache Trafodion? Ingredients for a world-class relational database Master ESP ESP ESP ESP ESP ESP ESP ESP ESP ESP Master Multi- fragment Supports salting of data across region servers
  • 11. Why Apache Trafodion? Ingredients for a World-class Relational Database 4. World Class Distributed Transaction Management system © Copyright 2015 Esgyn Corporation Esgyn Confidential
  • 12. Performance YCSB and Order Entry scale linearly! Transactional Order Entry Throughput YCSB Selects Updates 50/50 Throughput Throughput Throughput
  • 13. Try and Contribute Apache Trafodion Download: ◦ trafodion.apache.org Try Trafodion on AWS: ◦ https://aws.amazon.com/marketplace/pp/B018RBMFG0 Documentation: ◦ trafodion.apache.org Become a contributor – add a new feature, fix a bug, translate documentation, more ◦ Discuss your changes on the dev mailing list ◦ Create a JIRA issue ◦ Setup your development environment ◦ Prepare a patch containing your changes ◦ Submit the patch

Editor's Notes

  1. A database transaction, must be atomic, consistent, isolated and durable. Below we have discussed these four points. Atomic : A transaction is a logical unit of work which must be either completed with all of its data modifications, or none of them is performed. Consistent : At the end of the transaction, all data must be left in a consistent state. Isolated : Modifications of data performed by a transaction must be independent of another transaction. Unless this happens, the outcome of a transaction may be erroneous. Durable : When the transaction is completed, effects of the modifications performed by the transaction must be permanent in the system. Often these four properties of a transaction is known as ACID