Low fare search was a cluster of 8 mainframes, running a heuristic that didn't always get a good solution. We built new algorithms and moved it all to a Linux cluster. This presentation describes the parts we put on MySQL, back when mainstream mission critical hadn't even heard of MySQL.
The open source precompiler let us take HP NonStop code and compile it, unchanged, to run against MySQL.
2. Agenda
• Sabre Holdings Overview
• Business drivers for MySQL & Open Source
• Shopping for fares
• Air Travel Shopping Engine (ATSE)
• Data replication strategy
• ESQL precompiler for MySQL
• Other MySQL users at Sabre
2
22
3. Who is Sabre Holdings?
A world leader in travel commerce,
retailing travel products, and
providing distribution and
technology solutions for the
travel industry
3
33
5. Sabre Holdings Fast Facts
• Industry leader in multiple travel channels
• Revenues of $2.06 billion in 2002
• S&P 500 company
• NYSE:TSG
• Headquarters in Dallas/Fort Worth, Texas
• 6,500 employees in 45 countries
5
55
6. Business drivers
Over 3 billion
fare combinations
for a single customer request
Multiple airlines, flights, fare types, dates
prices, taxes, surcharges
6
66
7. Business drivers
• No direct revenue for shopping queries
• Revenue for booking, but not looking (searching)
• Look-to-book ratio increasing
• Competition requires staying on the “leading edge”
• Highly reliable and scalable database
• Fast processors
• Large real memory
• Smart algorithms
• Shopping is a good fit for horizontal scale
• Pricing requires higher precision
7
77
8. Business drivers
Application
DB / Middleware
Computing
Stack
Commodity
Point
Operating System
Hardware
Hardware, operating system, database and middleware are
becoming commodities. This drives the cost down rapidly.
Open source software is a major driver of this effect.
8
88
9. Business Solution
• Linux servers alongside HP NonStop servers to create
“hybrid” Air Travel Shopping Engine (ATSE) platform
• HP NonStop delivers high availability and reliability
– Better than or equal to legacy, but at significantly lower cost
– Best fit for critical workloads and master database
management
• Linux / MySQL delivers 64-bit memory and faster CPUs
– Lower availability and reliability than HP NonStop but at
significantly lower cost
– Best fit for CPU-intensive shopping workloads
Most cost-effective platform for the shopping workload
9
99
10. Business drivers
• Sabre’s legacy
• World’s first commercial OLTP system in 1960
• Mainframe clusters running TPF
• Operating system customized to our needs
• True 7*24 application, with zero scheduled downtime
• Most application code in assembler
• Sabre’s future
• Higher-level languages
• Relational databases
• Internet
• Open systems
• Reduce specialized training
• Use off the shelf software
• HP NonStop with OSS is a key component (LINUX?)
10
10
10
11. Shopping
• Finding cheap air fares is hard!
• With 50+ connect points to consider, and >100 fares per
leg, we need to evaluate >3 billion combinations
• Up to a million fares can change every day
• Availability changes continuously
• Solve it >100 times per second
• Other functions
• Price 250 tickets per second
• Process 1000 flight routing requests per second
11
11
11
12. Pricing
• Shopping vs. Pricing
• Shopping is the problem of finding low fares
• Pricing is used to print the ticket
• Pricing has to be accurate, or we pay the difference to the
airline
• Many internet search engines still rely on mainframes to
actually print the ticket
• Pricing also requires additional functions, such as refunds,
exchanges and auditing
12
12
12
13. Algorithms
• Fare-led search
• Graph-based algorithm that searches all fare
combinations across 50+ connect points
• Can generate up to a 4-segment connection
• Search space of >3 billion fare combinations
• Match or exceed any competitor in finding lowest fare
• Only loses to competitors to have access to exclusive
private fares and/or other discounts
• Search actually checks Direct Connect Availability, so that
low fare options are actually bookable
13
13
13
14. Algorithms
• Dynamic schedules
• Connections are not generated overnight and stored
• Not limited to routes explicitly setup by airlines or other
marketing staff
• Availability Manager
• Flexible rules to access airline availability
• Current methods
– Direct Connect
– Host Availability
– Teletype (AVS)
• Can also use
– Cached DCA
– Inventory proxy
14
14
14
15. ATSE Hybrid
• Air shopping for desirable itineraries
• Must search through multiple airlines, flights, fare types,
dates, adjacent airports, etc.
• Must calculate prices, taxes, surcharges
• Complexity
• Single round-trip request can have over 3 billion fare
combinations
• Search is CPU and memory intensive
• Business driver
• No direct revenue for shopping transactions
• Increasing look to book ratio
15
15
15
16. ATSE Hybrid
• Combine Linux servers and HP NonStop servers
• HP NonStop delivers high availability and reliability
• Better than or equal to TPF at significantly lower cost
• Master database management
• Data replicated in real-time to Linux servers
• PNR pricing, schedules and availability
• Linux delivers 64-bit memory model and faster CPUs
• Lower availability and reliability than HP NonStop but at
significantly lower cost
• Horizontally scaled server farm with spare capacity
• Best fit for CPU-intensive shopping workloads
16
16
16
17. ATSE Hybrid
IBM
Fare and Rule
Updates
Schedule and Availability
Updates
IBM
PSS
MVS
d i g i t a l
d i g i t a l
d i g i t a l
d i g i t a l
d i g i t a l
d i g i t a l
HP Non-Stop
Air Shopping
Transactions
Shopping
Availability
Transactions Requests
Naming Service
And
Load Balancing
DB Image
Load
and Updates
E/R
Logging
and Billing
Linux Server Farm
Load Information
Linux
Linux
Linux
Linux
Linux
Linux
Linux
Linux
Linux
Linux
Linux
Linux
Linux
Linux
Linux
Linux
Linux
Linux
Linux
Linux
Linux
Linux
Linux
Linux
17
17
17
18. ATSE Linux servers
• In production since July 2003
• Started with HP rp5405 servers (Unix PA-RISC)
– Migrated to Itanium in December 2003
• Using 45 HP rx5670 servers
– 4-way, 1.5 GHz, 6MB L2 cache, 32GB RAM, 4x72GB SCSI
• Software
• MySQL 4.0.15
• GNU compilers – g++ 3.2.3 and glibc 2.3.2
• TAO object request broker
• Redhat RHAS 2.1
• GoldenGate Extractor/Replicator
• Monitoring – Prognosis, CA Unicenter, scripts
18
18
18
19. ATSE Software
• Extensive use of open source software
• MySQL 4.0.15
• GNU compilers – g++ 3.2.3 and glibc 2.3.2
• TAO object request broker
• Redhat Linux AS 3.0
• Third party software
• GoldenGate Extractor/Replicator
• Monitoring – Prognosis, CA Unicenter, scripts
• Internally developed applications and scripts
19
19
19
20. Data replication
• HP NonStop (Tandem) is master database
• Golden Gate Software used to replicate to MySQL
– Extracts data form undo/redo logs on the NonStop server
– Performs INSERT / UPDATE / DELETE on MySQL
– Software performs catch-up / resync in case of crashes or
other failures
• Each Linux server has an identical copy of the database
– 50GB database on each server, all InnoDB
• Replication volume
• 150 tables replicated (over 300 on NonStop server)
• Can replicate 1M fare changes / hour
• Data updates on 7x24 basis
20
20
20
24. Hybrid
• Horizontal scalability
• Ability to throw inexpensive CPUs at the problem
• Tolerate failure of a single server
• How do we get there from here?
• Database and network functions remain on Himalaya
• C++ code readily ports to Linux
• Publish/subscribe metaphor for data in memory
• 64-bit addressing to avoid memory constraints
24
24
24
25. Connectivity
• CORBA
• Major functions use CORBA internally
• CORBA requests to TPF for availability
• CORBA to CTS for DCA this Summer (bypass TPF)
• Asynchronous messaging via MQ Series
• XML
• Currently uses XML requests from TPF (over RPPC) for
pricing functions
• Working on direct access from Travelocity to ATSE
– Will be used for BIP
– Already working over HTTP (development systems)
– Working on security & billing for production
25
25
25
26. Timeline
• 2000
• Proof Of Concept, April – August
• 5 core developers, partnership with Compaq
• 2001
• Development & training began in February
• Initial hardware delivered
• 2002
• Phase 1 in production since July
• Zero downtime since implementation
• Rapidly developing additional functionality
• Wow – this is from an ancient slide, huh?
26
26
26
27. Precompiler
• Challenge
• 500K lines of C/C++, 150+
files with embedded SQL
• We did not want to rewrite
ESQL / C code by hand
• Solution
• Wrote a precompiler that
converts ESQL to inline
MySQL calls
• About 1000 lines of awk
• We are willing to share this
code with others
EXEC SQL
int
double
char
EXEC SQL
BEGIN DECLARE SECTION;
host_a;
host_b;
host_c;
END DECLARE SECTION;
EXEC SQL DECLARE csr1 CURSOR FOR
SELECT a, b, c
FROM table1
WHERE x = :hostvar1;
EXEC SQL OPEN csr1;
while (rc >= 0 && rc != 100){
EXEC SQL FETCH csr1 INTO
:host_a, :host_b, :host_c;
printf("Fetch %d, %lf, %sn",
host_a, host_b, host_c);
}
EXEC SQL CLOSE csr1;
27
27
27
28. Precompiler
• How it works
• Convert C / ESQL to C++ code
• Polymorphism matches data types in the declare section
• Can ignore the declare section
EXEC SQL
int
double
char
EXEC SQL
BEGIN DECLARE SECTION;
host_a;
host_b;
host_c;
END DECLARE SECTION;
// EXEC
int
double
char
// EXEC
SQL BEGIN DECLARE SECTION;
host_a;
host_b;
host_c;
SQL END DECLARE SECTION;
28
28
28
29. Precompiler
Cursor declarations (SELECT statements) are converted to a static
struct. The struct has the text of the SQL, as well as statement
handles for doing prepare / execute (where applicable)
EXEC SQL DECLARE csr1 CURSOR FOR
SELECT a, b, c
FROM table1
WHERE x = :hostvar1;
// EXEC SQL DECLARE csr1
static e2mysql csr1 = {
" SELECT a,b,c FROM table1 WHERE x = :hostvar1"
, NULL , 0};
29
29
29
30. Precompiler
The OPEN, FETCH and CLOSE statements are converted into
function calls. The precompiler generates the code for these calls
and puts it at the end of the source module.
EXEC SQL FETCH csr1 INTO :host_a, :host_b, :host_c;
// EXEC SQL FETCH csr1
static int16 fetch_csr1()
{
if ( ! csr1.rslt )
return SQL_ERROR;
if ( csr1.row >= mysql_num_rows(csr1.rslt) )
return SQL_NO_DATA;
MYSQL_ROW row = mysql_fetch_row(csr1.rslt);
SQLBindColPoly(row[0], host_a, sizeof(host_a));
SQLBindColPoly(row[1], host_b, sizeof(host_b));
SQLBindColPoly(row[2], host_c, sizeof(host_c));
++csr1.row;
return SQL_SUCCESS;
}
30
30
30
31. Precompiler
A lightweight wrapper around the database API lets us
use polymorphism to convert to the types specified in the
declare section. There is a wrapper function for each
simple C++ type that we handle.
inline int32
SQLBindColPoly(const char* value, int32& parm, uint16 size)
{
parm = atoi(value);
return SQL_SUCCESS;
}
31
31
31
32. Precompiler
• Notes
• Light-weight C++ wrapper to MySQL API
• The precompiler understands some SQL syntax and does
some modifications of NonStop SQL/MP statements
• We have also used our precompiler to target other DBMS
– ODBC API
– Oracle
– PostgreSQL
• Since we convert C to C++, this may be problematic for
ESQL programs that used deprecated K&R syntax
– C++ compilers are stricter than C compilers
– However, we did not have this problem with our application
32
32
32
33. Other MySQL applications at Sabre
• ATSE is our largest and most mission critical
• We have other production systems that rely on MySQL
• Site59.com is the most visible
• MySQL also used for some internal databases
• More under development
• MySQL / Linux / SATA drives make cheap data marts
• Sometimes cheaper to replicate to a data mart than to
upgrade a central data warehouse
• Currently testing with a 1.5B row database
33
33
33
34. Site59
• Last minute travel packages
• Acquired by Travelocity in
March 2002
• Sales volume?
• Transaction rates?
• All dynamic content generated
using PHP & MySQL
34
34
34
35. Site59
Site59 implements a fairly “classic” dynamic website using MySQL.
Dynamic content is generated at about 30Mbits / second. Extensive
use is made of single and dual processor Linux machines (IA-32)
Presentation
(Apache/PHP)
Internet
HTTP
Application
Server
Reservations
System Gateway
XML/HTTP
Frontend DB
(MySQL, Linux)
Replication
Backend DB
(Oracle, Sun)
35
35
35