In memory OLAP engine

  • 869 views
Uploaded on

 

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
869
On Slideshare
0
From Embeds
0
Number of Embeds
3

Actions

Shares
Downloads
2
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. In memory OLAP engine Samuel Pelletier Kaviju inc. samuel@kaviju.com
  • 2. OLAP ? • An acronym for OnLine Analytical Processing. • In simple words, a system to query a multidimensional data set and get answer fast for interactive reports. • A well known implementation is an Excel Pivot Table.
  • 3. Why build something new • I wanted something fast, memory efficient for simple queries with millions of facts. • Sql queries dost not works for millions of facts with multiple dimensions, especially with large number of rows. • There are specialized tools for OLAP from Microsoft, Oracle and others but they are large and expensive, too much for my needs. • Generic cheap toolkits are not memory efficient, this is the cost for their simplicity. • I wanted a simple solution to deploy with minimal dependency.
  • 4. Memory usage and time to retrieve 1 000 000 invoice lines • Fetching EOs uses 1.2 GB of ram in 13-19 s • Fetching raw rows uses 750 MB of ram in 5-8 s. • Fetching as POJOs with jdbc uses 130 MB in 4.0 s. • Reading from file as POJOs uses 130 MB in 1.4 s. • For 7 M rows, EOs would require 8.4 GB for gazillions of small objects (bad for the GC).
  • 5. Time to compute sum of sales for 1 000 000 invoice lines • 2.1 s for "select sum(sales)..." in FrontBase with table in RAM. • 0.5 s for @sum.sales on EOs. • 0.12 s for @sum.sales on raw rows. • 0.5 s for @sum.sales on POJOs. • 0.009 s for a loop with direct attribute access on POJOs.
  • 6. Some concepts • Facts are the elements being analyzed.An exemple is invoice lines. • Facts contains measures like quantities, prices or amounts. • Facts are linked to dimensions used to filter and aggregate them. For invoice lines, we have product, invoice, date, etc. • Dimensions are often part of a hierarchy, for example, products are in a product category, dates are in a month and in a week.
  • 7. Sample Invoice dimension hierarchy Invoice Line Invoice Date Month Ship to Client type Sold to Product Salesman SalesManager Week Client type Measures: Shipped Qty Sales Profits
  • 8. Steps to implement an engine • Create the Engine class. • Create required classes to model the dimension hierarchy. • Create theValue class for your facts. • Create the Group class that will compute summarized results. • Create the dimensions definition classes.
  • 9. Engine class • Engine class extends OlapEngine with Group andValue types.
 public class SalesEngine extends OlapEngine<GroupEntry,Value> • Create the objects required for the data model and lookup table used to load the facts. • Load the fact intoValue objects. • Create and register the dimensions.
  • 10. Create required model objects public class Product { public final int code; public final String name; public final ProductCategory category; public Product(int code, String name, ProductCategory category) { super(); this.code = code; this.name = name; this.category = category; } } ! private void loadProducts() { productsByCode = new HashMap<Integer, Product>(); ! WOResourceManager resourceManager = ERXApplication.application().resourceManager(); String fileName = "olapData/products.txt"; try ( InputStream fileData = resourceManager.inputStreamForResourceNamed(fileName, null, null);) { InputStreamReader fileReader = new InputStreamReader(fileData, "utf-8"); BufferedReader reader = new BufferedReader(fileReader); String line; while ( (line = reader.readLine()) != null) { String[] cols = line.split("t", -1); Product product = new Product(Integer.parseInt(cols[0]), cols[0], categoryWithID(cols[1])); productsByCode.put(product.code, product); } } ... }
  • 11. Load the facts and create dimensions private void loadInvoiceLines() { ... loadProductCategories(); loadProducts(); ! InvoiceDimension invoiceDim = new InvoiceDimension(this); SalesmanDimension salesmanDim = new SalesmanDimension(this); while ( (line = reader.readLine()) != null) { String[] cols = line.split("t", -1); ! InvoiceLine invoiceLine = new InvoiceLine(valueIndex++, Short.parseShort(cols[1])); invoiceLine.shippedQty = Integer.parseInt(cols[6]); invoiceLine.sales = Float.parseFloat(cols[7]); invoiceLine.profits = Float.parseFloat(cols[8]); lines.add(invoiceLine); invoiceDim.addLine(invoiceLine, cols[0], cols); ! invoiceLine.salesmanNumber = Integer.parseInt(cols[12]); salesmanDim.addIndexEntry(invoiceLine.salesmanNumber, invoiceLine); ... } } addDimension(productDimension); addDimension(productDimension.createProductCategoryDimension()); ... lines.trimToSize(); setValues(lines); }
  • 12. Value and GroupEntry classes • Value classe contains your basic facts (invoice lines for example) 
 public class InvoiceLine extends OlapValue<Sales> • GroupEntry is use to compute summarized results.
 public class Sales extends GroupEntry<InvoiceLine> • These are tightly coupled, a GroupEntry represent a computed result for an array ofValues; metrics are found in both classes.
  • 13. Value Class public class InvoiceLine extends OlapValue<Sales> { public Invoice invoice; public final short lineNumber; public Product product; ! public int shippedQty; public float sales; public float profits; ! public int salesmanNumber; public int salesManagerNumber; ! public InvoiceLine(int valueIndex, short lineNumber) { super(valueIndex); this.lineNumber = lineNumber; } }
  • 14. GroupEntry class public class Sales extends GroupEntry<InvoiceLine> { private int shippedQty; private double sales = 0.0; private double profits = 0.0; ! public Sales(GroupEntryKey<Sales, InvoiceLine> key) { super(key); } ! @Override public void addEntry(InvoiceLine entry) { shippedQty += entry.shippedQty; sales += entry.sales; profits += entry.profits; } ! @Override public void optimizeMemoryUsage() { } return sales; } ! ... }
  • 15. Dimensions classes • Dimensions implement the engine indexes and key extraction for result aggregation. • Dimensions are usually linked to another class representing an entity like Invoice, Client, Product or ProductCatogory. • Entity are value object POJO for optimal speed an memory usage.You may add a method to get the corresponding eo. • Dimensions are either leaf (a group of facts) or group (a group of leaf entries).
  • 16. Product dimension class public class ProductDimension extends OlapLeafDimension<Sales,Integer,InvoiceLine> { ! public ProductDimension(OlapEngine<Sales, InvoiceLine> engine) { super(engine, "productCode"); } ! @Override public Integer getKeyForEntry(InvoiceLine entry) { return entry.product.code; } ! @Override public Integer getKeyForString(String keyString) { return Integer.valueOf(keyString); } public ProductCategoryDimension createProductCategoryDimension() { long startTime = System.currentTimeMillis(); ProductCategoryDimension dimension = new ProductCategoryDimension(engine, this); ! for (Product product : salesEngine().products()) { dimension.addIndexEntry(product.category.categoryID, product.code); } long fetchTime = System.currentTimeMillis() - startTime; engine.logMessage("createProductCategoryDimension completed in "+fetchTime+"ms."); return dimension; } ! private SalesEngine salesEngine() { return (SalesEngine) engine; }
  • 17. Product category dimension class public class ProductCategoryDimension extends OlapGroupDimension<Sales,Integer,InvoiceLine,ProductDimension,Integer> { ! public ProductCategoryDimension(OlapEngine<Sales, InvoiceLine> engine, ProductDimension childDimension) { super(engine, "productCategoryCode", childDimension); } ! @Override public Integer getKeyForEntry(InvoiceLine entry) { return entry.product.category.categoryID; } ! @Override public Integer getKeyForString(String keyString) { return Integer.valueOf(keyString); }
  • 18. Initialize and use in an app • The engine is multithread capable once loaded. • I usually create a singleton for the engine; it can also be in your app class. • Entity are value object POJO for optimal speed an memory usage.You may add a method to get the corresponding eo. • Dimensions are either leaf (a group of facts) or group (a group of leaf entries).
  • 19. Use in a application public Application() { ... SalesEngine.createEngine(); } ! ! In the component that uses the engine ! public OlapNavigator(WOContext context) { super(context); .... engine = SalesEngine.sharedEngine(); if (engine == null) { Engine me bay null if it has not completed it's loading... } } ! someFetchMethod() { OlapResult<Sales, InvoiceLine> result = engine.resultForRequest(query); ! rows = new NSArray<Sales>(result.getGroups()); sort or put inside a ERXDisplayGroup... } !
  • 20. Demo app
  • 21. Java and memory • To keep the garbage collector happy, it is better to have a maximum heap at least 2-3 times the real usage. • GC can kill your app performance if memory is starved.With default setting, it may even kill your server by using multiple core for long periods at least in 1.5 and 1.6. • Java 1.7 contains a new collector, probable better.
  • 22. Q&A Samuel Pelletier samuel@kaviju.com