SenchaCon 2016: Using Ext JS to Turn Big Data into Intelligence - Olga Petrova

53 views

Published on

With the addition of D3 visualizations and a partner technology Ext Speeder that speeds up data access from back-end systems by up to 10x, we are now able to handle very large volumes of data in custom analytical applications. In this session, we’ll look at the core components of Ext JS that make it a perfect fit for customized big data applications, and showcase an app built using these components that handles millions of records in the browser.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
53
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Before I will start I would like to ask you one question: Who of you has ever added BI or Data analysis functionality to your applications? Who of you has ever thought about that or have been requested to add it?

    Ok, many of you. Great! Than you are at the right place.
    My name is Olga and I am working as a Sales Engineer at Sencha. I am based in Munich, Germany.
    I will be talking about adding BI and data analysis functionality to your application using Ext JS 6.2.
  • I will talk you about common and sensitive problems of developing a web front-end for BI applications, how easy you may solve these problems using Ext JS and what benefits you may achieve as a bonus. To illustrate all points I have developed a sample BI application that I will also show you.
  • But before I move to the main part I would like to tell you a personal story why I have decided to take this topic.
    Over the past 4 years I have been working in several companies who developed software for data scientists or perform different kinds of data analysis for different business areas, for example for financial traders.
    Than I have been also worked at the Big Data Lab of Volkswagen concern where we develop data analysis projects for different automotive brands inside of the concern: Volkswagen, Audi, Skoda, Scania, Seat etc. So I have been working with a lot of Data Scientists. And at this time I have understood that data analysis is one of the fastest-growing sub-industries in IT and extremely popular right now.
    People are realizing that without understanding their data they could not make competitive business decisions in the today fast-changing world.
  • Data Science is really a top trend right now. All main automotive companies in Germany have opened data analysis departments in last 2 years: Volkswagen, BMW, Mercedes, Audi and others.
    We could easily see that this topic is hot not only by the amount of news related to Data Science in different medias …
  • but also by a huge demand for data scientists. Job portal Glassdoor published the list of best jobs in USA and Data Scientist took the first place.
  • It is not possible I guess to get such kind of statistic on Linkedin. But we could see the count of Data Scientist positions published over the last month in US is almost 30 thousands.
    Yes, the world needs data scientists.
  • And all related industries are trying to adapt to this demand. Very popular online learning platform coursera published a lot of new courses related to Data Science over the last year or so. Now it contains more than 400 online data science courses.

    But what does it mean for us - Software developers?
  • It means that it is an awesome news for us! There is and it will be a huge demand for BI and Data Analysis application and tools. So more work for us. Is it great, isn’t it?
  • You may wonder why - there is already a lot of solutions and tools on the market. That is true. And over the past 4 years I have been working with a lot of different data analysis and visualization tools. And I would say that some of them a really great in solving some particular tasks, some not so good.
    But I would say that almost all tools on the market have particular problems with front-ends, especially web front-ends. Or I would rather say serious limitations.
    The most important I guess is performance problems by visualizations of huge datasets.
    They could visualize perfectly small or middle-size datasets but when dataset becomes really huge it is always a pain…
    Another limitation is lack of flexibility: all of them have limited functionality and user interactions and customization requires development of specific plugins.
    Integration with the current applications and systems will be tricky because many solutions are closed.
    Many of these tools are manure, heavy and extremely complex. You would need to spend a lot of time to configure them. And the most of such kind of tools are really expensive.
    At another hand most of businesses would be completely satisfied with a simple so called BI Lite solution or would need a really custom and flexible solution.
    That is why they may want to spend time and money to develop their own tool or application.
  • I would like to continue the story about my time at Volkswagen. At that time I have already been developing web applications using Ext JS for almost 9 years and every time, really every single time at Volkswagen I have been thinking that Ext JS would be a better option as front-end for almost all our use-cases. And it was at the time before Ext JS 6.2. Now I would say it is a perfect option.
    So if you would ever need to develop a front-end for BI application (if you either need a BI Lite solution or a really custom and flexible solution) I would suggest you to use Ext JS. And there are the most important reasons:
    Ext JS has many robust ready-to-use components for data visualization and for visualization of data analysis results, components are already optimized for working with huge datasets and do this with extreme performance. At another hand they are really flexible it is very easy to customize Ext JS components, add any kind of user interactivity and integrate them with any back-end system or database.
  • But all these are just words so to proof them I have developed a sample application to illustrate BI front-ends problems and solutions.
    To be honest the hardest part in this application was to find a suitable dataset for my application. I had the following requirements: the dataset should be public, big enough and could let me tell an interesting data story.
    In reality there is not some many open big and interesting datasets in the Internet. I have tried several options and chose …
  • I chose the data from American Community Survey for 2013 performed by Unites States Census Bureau.
    The American Community Survey provides critical economic, social, demographic, and housing information about US communities every year. This data is used to manage or evaluate government programs.
    And this data is public and available to everyone.
  • So in the database I have data about more than 2 Mio people and almost 1,5 Mio houses.
    All data is of course anonymized: it doesn’t include any real name or address.
    Personal data includes for example state, sex, incomes, education, industry where person works, occupation, working hours per week.
    Houses data includes state, type of house: stand-alone house or apartment for example, property value, total household’s income, amount of rent and owner costs.
    For my use case state, income, industry, occupation, property value, rent and house owner costs are the most important for my app.
  • And the central part of any data analysis or BI application - what question does it answer? what data story does it tell?
    And with my app I have tried to answer that following question: Where should I live? What is the best state for me?
    So I calculate the best state to live in US for a particular industry and occupation.
    The obvious question there is how I have calculated the best state. Originally I have tried to calculate it based on the highest amount of money after housing: the amount of free money that you will have after paying for an apartment or a house.
  • But later I was advised to use Purchasing Power concept. Purchasing power is the number of goods and services that you could buy with a unit of currency. In US it is 100 dollars.
    Purchasing power shows the real value of $100 in each state. Prices for the same goods are often much cheaper in states like Missouri or Ohio than they are in states like New York or California. As a result, the same amount of cash can buy you comparatively more in a low-price state than in a high-price state.
    The Bureau of Economic Analysis has been measuring this phenomenon and publish the purchasing power values for every state.
    Using this data I have adjusted the average incomes in different states to compare them to each other.
  • So there are typical functional parts of every BI or Data Analysis application that I have also included into my sample application.
    First it is often necessary to display the original data sources to let user to play with them. So in my application I show the original data from American community survey.
    Second part I guess is mandatory for every BI app - display of analytics, data analysis results or BI data. So it is included in my app as well.
    And the last part that is very common is data visualization in charts, maps, graphs etc.
    I will go through every part and explain common problems and how I have solved them using Ext JS.
  • The first part - visualization of original data sources. (Show in app)
  • If you need to show a small dataset it is not a problem at all. Any solution could be fine.
    The problem comes when you need to visualize a huge dataset that contains for example more than 1 Mio rows. Then it becomes a painful question.
  • What could come to the mind first:
    load the whole dataset to the browser and draw a table… but even with the simplest HTML table it will be at least 6 Million DOM elements - even for 5 columns. You could image that It will not work for sure - no browser could handle this, doesn’t matter what hardware you are using.
    second idea - use pagination. Yeah, this solution may work, but only in theory. Because even if you have page size equal to 1000 (and it is a lot) you would need not have 1000 pages. Well, you could imagine that this option is extremely far from being a good user interface: no user will have so much time and patience to click through 1000 pages
  • But Ext JS could offer us? A combination of Ext JS Grid with buffered rendering and Ext Speeder back-end is perfectly suitable to solve this problem.
    May I ask how many of you are using Ext JS grid with buffered rendering?
    Who has been at Per Minborg & Jon Jarboe’s session about Ext Speeder? (Ok not so many then I will explain that in more details)
    Ext Speeder is a java-based backend for Ext JS Grid. It caches data from a database in a memory and responses to client’s requests immediately without querying a database.
    Ext JS Infinitive Grid lets you render and show to user only part of the whole dataset at a time. And when user starts scrolling the grid will load and render new data.
    So only visible rows are rendered at the time and a bit more from above and below to smooth the scrolling.
    But if you are using Ext Speeder backend user may even not recognize that because data loading is extremely fast.
  • It is how Ext JS buffered store definition looks like. You just define that it is a buffered store and add url pointing to Ext Speeder backend. And the rest will be done automatically by Ext JS Grid without any explicit configurations.
  • So what benefits you will get if you will use a combination of Ext JS Grid and ExtSpeeder backend. You will get an extreme performant solution for visualization of huge datasets, a real-time big data solution.
    Filtering and sorting could also be performed without any delay because ExtSpeeder generates caches for different filter and sort combinations.
  • The second part is the main part. it is a data analysis or BI functionality.
  • The question there is How to aggregate data to highlight hidden trends and insides? How to show your Business intelligence to a user in a meaningful and useful way?


  • The possible solutions there may be:
    third-party component that you may find and would need to integrate into your application.
    or you may create such kind of component by yourself: manually aggregate data and visualize it using tables or lists.

    I am sure you could imagine how much time and resource these options may require. First - to implement them and second - to support them in your enterprise application for a long time
  • Or you may just use Sencha Pivot Grid. Who of you is using Pivot Grids?
    (Ok, a lot, it is great)
    So Sencha Pivot Grid is the same concept like popular Excel Pivot Grid.
    It let you aggregate data per several dimensions and show the result in a grid.
  • To configure Pivot Grid you need to define what dimensions you want to use for rows and columns. You could define several dimensions not only one like in my example.
    Then you need to define what dimensions and what function you want to use for data aggregation. And the last thing is a matrix config.
    You may choose between remote or local matrix. When you use local matrix all calculations will be performed locally on the client. It is a perfect option for small and mid-size datasets because it is fast and you don’t need to do anything else. Everything will be done by the pivot grid.
    Another option is remote matrix - then calculations will be performed on the server. For this case you need to develop a web service that will aggregate your data. Sencha provide an example of php-script for server-side data aggregation that I have used in my app. This option requires more work from you but it is necessary for huge data sets like mine when data just could not be aggregated on the client.
  • With Sencha Pivot Grid you get a ready-to-use solution for BI that let you perform a custom aggregation of your data to highlight hidden trends and insides.
    Calculations could be done locally in a browser and at this case we could talk about BI lite solution or you could perform complex sophisticated aggregations on a server and display results in a Pivot Grids.
    Pivot Grid supports also a buffered rendering and other grid features.
  • And the last but the most attractive part is Data Visualization.
  • And the last but the most attractive part is Data Visualization. And the problem here is how to create high-performance data visualization with rich user interactions.


  • And I would say there are the same possible solutions like for BI part: self-made or third-party solution and the same problems: time and integration difficulties.

  • And Ext JS offers there 2 options - D3.js adapter and Sencha Charts.
    Sencha Charts are available for a long time and very well-known so in my sample application I have concentrated on D3.js adapter. It lets you more flexibility.
    Who of you is using D3.js?
    I have been using D3.js inside of Ext JS application a lot in the past. But it was always a bit tricky to transform the data from Ext JS store into the format used by D3 or integrate D3.js chart with Ext JS layouting system. And d3.js adapter introduced in Ext JS 6.2 could help to solve these problems.
  • Ext JS adapter offers you two options. First it offers several most popular d3 js visualization available out of box as Ext JS components.
    To use them you just need to define xtype, bind your store using data binding and may be configure some styles. And that’s it.
    You may show the visualization to your boss  You do not need to write a single line of d3.js code.
  • And the second option is to create your own visualization component. You may take any visualization from D3.js sample gallery.
    This option requires a general understanding of d3.js API but you do not need to know all details.
    So there is an example of using boxplot visualization from d3.js gallery. You just need to copy the source code of visualization into your project and add reference to this file to your app.json config file.
    Then in the vis file you need to develop handlers to 2 main events fired by d3.js adapter - scenesetup (is fired when vis container is ready and you may draw your vis) and sceneresize (when the size of vis was changed).
  • And this is an example how you can draw your visualization. Basically I have copied this code from d3.js example and adapted it a bit.
    First I call d3.box method to calculate to calculate boxplots.
    Then I specify min and max values for the domain so d3 will know how to scale my boxplots, and render my vis.
  • Ok, Benefits. With d3.js adapter you could create any kind of custom d3.js visualization using one of two engines - canvas or svg.
    SVG is perfect when you need to support rich user interactions and for big datasets canvas may be a better option.
    You could easily bind Ext JS store to d3.js vis.
    d3.js adapter offers the full flexibility and you can customize your vis as you want and add any desired user interactions.
  • So I would like to summarize and list all Ext JS components that I have used in my sample application. For visualization of original data sources I used Ext JS Buffered grids and Ext Speeder backend. I used Pivot Grids for BI and for data visualization I have used d3.js adapter.
  • So I would like to summarize and list all Ext JS components that I have used in my sample application. For visualization of original data sources I used Ext JS Buffered grids and Ext Speeder backend. I used Pivot Grids for BI and for data visualization I have used d3.js adapter.
  • As you have seen Ext Js offers a robust and flexible frondend solutions for enterprise data analysis applications, let you visualize your big data in grids and charts with very high performance.
    With Ext JS you could easily add BI functionality to your app and use your data for making smart business decisions.
  • SenchaCon 2016: Using Ext JS to Turn Big Data into Intelligence - Olga Petrova

    1. 1. Using Ext JS Components to Turn Big Data into Actionable Intelligence Olga Petrova Sales Engineer @ Sencha
    2. 2. Agenda • Common problems and pitfalls • Ext JS solution • Benefits • Sample BI application
    3. 3. “Without data you’re just another person with an opinion.” W. Edwards Deming
    4. 4. Data Science is a Top Trend
    5. 5. Best Job in USA
    6. 6. Huge Demand for Data Scientists
    7. 7. Huge Demand for Data Science Courses
    8. 8. Huge Demand for Data Science Applications
    9. 9. UI Disadvantages of Popular Solutions • Performance problems by visualization of huge datasets • Pure user interactions • Difficult customization • Difficult integration with current applications
    10. 10. Advantages of Ext JS UI • Robust out-of-box components for visualization of data analysis results • Optimized for huge datasets, extreme performance • Easy customization and integration with current applications
    11. 11. Sample Application
    12. 12. Use Case Data American Community Survey 2013 www.census.gov/programs-surveys/acs
    13. 13. Use Case Data • 2.276.839 people and 1.476.313 houses • People: state, sex, incomes, education, industry, occupation, working hours, etc. • Houses: state, type of house, property value, total household’s income, rent, owner costs, etc.
    14. 14. “Where should I live?” Main Question:
    15. 15. “Purchasing Power” Concept Purchasing power is the number of goods or services that can be purchased with a unit of currency. http://www.bea.gov/newsreleases/regional/rpp/rpp_newsrelease.htm
    16. 16. Application Functionality • Display of original data sources • Display of data analysis results: highlight hidden trends and insides • Data visualization
    17. 17. Data Sources
    18. 18. Data Sources Problem How to show a grid with more than 1 Million rows?
    19. 19. Data Sources Possible solutions • Load dataset in a browser: at least 6 Million DOM elements (for 5 columns) • Use pagination: 1000 pages if page size is 1000
    20. 20. Data Sources Ext JS Solution • Ext JS Grid with buffered rendering: only part of the dataset is rendered at a time • Ext Speeder backend: server-side smart cache
    21. 21. Data Sources: Houses Grid
    22. 22. Data Sources Benefits • Extreme performance • Real-time Big Data • Filtering and sorting are supported
    23. 23. Data Analysis / BI
    24. 24. Data Analysis / BI Problem How to aggregate data to highlight hidden trends and insides?
    25. 25. Data Analysis / BI Possible solutions • Third-party solution: problems with integration, pure interactivity • Self-made solution: manual data aggregation and visualization with tables or lists
    26. 26. Data Analysis / BI Ext JS Solution Sencha Pivot Grid: aggregate data per several dimensions and highlight hidden trends and insides
    27. 27. “Purchasing Power” Pivot Grid
    28. 28. Data Analysis / BI Benefits • Out-of-box solution for displaying BI • Easy highlighting of hidden trends and insides • Custom data aggregation • Local and remote calculations
    29. 29. Data Visualization
    30. 30. Data Visualization Problem How to create a custom high-performance visualization with rich user interactions?
    31. 31. Data Visualization Possible solutions • Develop a custom chart component manually: take time • Use third-party solution: integration problems
    32. 32. Data Visualization Ext JS Solution • D3.js adapter • Sencha Charts
    33. 33. Data Visualization: Out-of-box Visualization
    34. 34. Data Visualization: Custom Visualization
    35. 35. Data Visualization: Custom Visualization
    36. 36. Data Visualization Benefits • Any kind of custom visualization: huge examples gallery • Canvas and SVG engine support • Rich user interactions • Easy data binding to Ext JS store • Integration into Ext JS layouting system
    37. 37. Used Ext JS Components • Ext JS Buffered Grid • Ext Speeder back-end • Ext JS Pivot Grid • D3.js Adapter
    38. 38. GitHub https://github.com/olga-petrova/DataAnalysisApplication
    39. 39. Conclusion Ext JS offers a robust and flexible UI for enterprise data analysis applications, visualize big datasets in grids and charts with high performance

    ×