Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

HPCC Systems vs SAS: The Final Countdown

14 views

Published on

As part of the 2018 HPCC Systems Community Day event:

Archway Health shares their experience with using HPCC Systems alongside SAS for supporting a bundled payments program solution in the health industry.

Luke Pezet is a solution and software architect with over 10 years of experience in pioneering web analytic tools and complex data management projects. His expertise includes designing and implementing big data solutions to process millions of data inputs on a daily basis to monitor, assess, and improve performance. Mr. Pezet is a successful technology entrepreneur who was an early employee of IgoUgo.com, which was sold to Travelocity, and co-founder of Tripfilms, one of the largest databases of travel videos on the web. He also has served as interim CTO for The Achievement Network (ANET), a non-profit education company that helps schools use real time assessment data to improve student performance. At ANET, he implemented web tools for staff to help scale their operations and end-user web sites for teachers and principals to access reports and analysis. Within just a few years, this platform has helped ANET grow from 13 schools in the Boston area to over 480 schools and 145,000 students across 10 states. ANET has been recognized as a pioneer in education innovation and was named “New Schools Ventures Organization of the Year” in 2011. Mr. Pezet has also worked on data management projects with USA Today, Rand McNally, Microsoft, Samsung, and many others. Mr. Pezet holds a master’s degree in computer science from Rennes University in France.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

HPCC Systems vs SAS: The Final Countdown

  1. 1. Innovation and Reinvention Driving Transformation OCTOBER 9, 2018 2018 HPCC Systems® Community Day Luke Pezet, Archway Health HPCC Systems vs SAS: The Final Countdown
  2. 2. “Change is the only constant in life” HPCC Systems vs SAS: The Final Countdown 2 — Heraclitus
  3. 3. Me, Me and Me...at Archway • Solution Architect with over 15 years of experience • Worked for Archway Health Advisors ~ 5 years • Archway helps care providers manage bundled payment programs. • Needed to process medical claims 5 years ago and chose HPCC Systems over SAS, Hadoop*, etc. • New employees brought other technologies, including SAS 3HPCC Systems vs SAS: The Final Countdown
  4. 4. Introduction HPCC Systems • Open-source data-intensive computing system platform developed by LexisNexis Risk Solutions. • Development started before 2000. • Scalable Data refinery called Thor and scalable rapid data delivery engine called ROXIE. SAS (“Statistical Analysis System”) • Proprietary software suite developed by SAS Institute that provides advanced analytics. • Development started in 1966. HPCC Systems vs SAS: The Final Countdown 4
  5. 5. Use Case • Based on Regression With SAS Chapter 1 - Simple And Multiple Regression web book from Institute for Digital Research and Education at UCLA. • It's about data analysis and demonstrates how to use software for regression analysis. This is not about the statistical basis of multiple regression or which criterion is best to choose models, etc. • Data was created by randomly sampling 400 elementary schools from the California Department of Education's API 2000 dataset. • Contains a measure of school academic performance as well as other attributes such as class size, enrollment, poverty, etc. 5HPCC Systems vs SAS: The Final Countdown
  6. 6. Helper SASsy ECL bundle ecl-bundle install https://github.com/lpezet/SASsy.git Usage: IMPORT SASsy; // OR IMPORT SASsy.PROC; 6HPCC Systems vs SAS: The Final Countdown
  7. 7. Loading data SAS DATA scores; INFILE datalines dsd; INPUT Name : $9. Score1-Score3 Team ~ $25. Div $; DATALINES; Smith,12,22,46,"Green Hornets, Atlanta",AAA Mitchel,23,19,25,"High Volts, Portland",AAA Jones,09,17,54,"Vulcans, Las Vegas",AA ; ECL layout := { STRING Name; UNSIGNED Score1; UNSIGNED Score2; UNSIGNED Score3; STRING Team; STRING Div; }; scores := DATASET( [ { ‘Smith’,12,22,46,’Green Hornets, Atlanta’, ‘AAA’ }, { ‘Mitchel’, 23,19,25,’High Volts, Portland’, ‘AAA’ }, { ‘Jones’, 09, 17, 54, ‘Vulcans, Las Vegas’, ‘AA’ } ], layout ); HPCC Systems vs SAS: The Final Countdown 7
  8. 8. Looking at the data (SAS) HPCC Systems vs SAS: The Final Countdown 8 PROC PRINT data=”elemapi” (obs=5); run;
  9. 9. Looking at the data (ECL) HPCC Systems vs SAS: The Final Countdown 9 IMPORT SASsy.PROC; PROC.PRINT( ElemAPIDS, 5 ); // CHOOSEN( ElemAPIDS, 5 );
  10. 10. Looking at the data (SAS) HPCC Systems vs SAS: The Final Countdown 10 PROC CONTENTS data=”elemapi”; run;
  11. 11. Looking at the data (ECL) HPCC Systems vs SAS: The Final Countdown 11 IMPORT SASsy.PROC; PROC.CONTENTS( ElemAPIDS );
  12. 12. Looking at the data (SAS) HPCC Systems vs SAS: The Final Countdown 12 PROC MEANS data=”elemapi”; var api00 acs_k3 meals full; run;
  13. 13. Looking at the data (ECL) HPCC Systems vs SAS: The Final Countdown 13 IMPORT SASsy.PROC; PROC.MEANS( oMeans, ElemAPIDS, 'api00,acs_k3,meals,full' ); OUTPUT( oMeans, NAMED('MEANS'));
  14. 14. Looking at the data (ECL) HPCC Systems vs SAS: The Final Countdown 14 IMPORT DataPatterns; DataPatterns.Profile( ElemAPIDS, features := ‘fill_rate,best_ecl_types,cardinali ty,lengths,min_max,mean,std_dev,qua rtiles,correlations’ );
  15. 15. Looking at the data (SAS) HPCC Systems vs SAS: The Final Countdown 15 PROC UNIVARIATE data=”elemapi”; var acs_k3; run;
  16. 16. Looking at the data (ECL) HPCC Systems vs SAS: The Final Countdown 16 IMPORT SASsy.PROC; PROC.UNIVARIATE( ElemAPIDS, 'acs_k3' ); Extreme - Lowest Extreme - Highest Missing Values Basics
  17. 17. Looking at the data (SAS) HPCC Systems vs SAS: The Final Countdown 17 PROC FREQ data=”elemapi”; tables acs_k3; run;
  18. 18. Looking at the data (ECL) HPCC Systems vs SAS: The Final Countdown 18 IMPORT SASsy.PROC; PROC.FREQ( ACSK3Freq, ElemAPIDS, 'acs_k3' ); OUTPUT( ACSK3Freq, NAMED(‘Frequency’));
  19. 19. Looking at the data (SAS) HPCC Systems vs SAS: The Final Countdown 19 PROC UNIVARIATE data=”elemapi”; var acs_k3; histogram / cfill=gray; run;
  20. 20. Looking at the data (ECL) HPCC Systems vs SAS: The Final Countdown 20 IMPORT Visualizer; PlotData := TABLE( SORT( ElemAPIDS, acs_k3 ), { STRING label := acs_k3; COUNT(GROUP); }, acs_k3 ); OUTPUT(oPlotData, NAMED('PlotData')); Visualizer.MultiD.Column('myChart',, 'PlotData');
  21. 21. MACROs SAS %MACRO MISSINGCHECK(VAR, TYPE); PROC SQL; CREATE TABLE &VAR._&TYPE. AS SELECT DISTINCT CLM_TYPE_1, COUNT(SYSKEY) AS &VAR._MISSING FROM OUTPUT.&TYPE. WHERE &VAR. IS MISSING GROUP BY CLM_TYPE_1 ORDER BY CLM_TYPE_1; QUIT; %MEND MISSINGCHECK; %MISSINGCHECK(MEMBER_ID, &EPI.GENERAL); %MISSINGCHECK(CLAIM_ID, &EPI.GENERAL); %MISSINGCHECK(MS_DRG, &EPI.GENERAL); %MISSINGCHECK(ADM_DGNS, &EPI.GENERAL); ECL MissingCheck( pDS, pField, pMissingValue, pByField ) := FUNCTIONMACRO #UNIQUENAME(tabled) %tabled% := TABLE( pDS( pField = pMissingValue ), { pByField; COUNT(GROUP); }, pByField ); #UNIQUENAME(sorted) %sorted% := SORT( %tabled%, pByField); RETURN %sorted%; ENDMACRO; MissingCheck( ElemAPIDS, meals, ‘’, dnum ); MissingCheck( ElemAPIDS, acs_k3, ‘’, dnum ); MissingCheck( ElemAPIDS, api00, ‘’, dnum ); HPCC Systems vs SAS: The Final Countdown 21
  22. 22. Multiple Regression (SAS) HPCC Systems vs SAS: The Final Countdown 22 PROC REG data="c:sasregelemapi" model api00 = acs_k3 meals full; run;
  23. 23. Multiple Regression (ECL) HPCC Systems vs SAS: The Final Countdown 23 IMPORT ML_Core; IMPORT LinearRegression; IMPORT SASsy; IndVars := 'acs_k3,meals,full'; DepVars := 'api00'; /* … */ ML_Core.ToField( inddata, inddataNF, __id__ ); ML_Core.ToField( depdata, depdataNF, __id__ ); MyOLS := LinearRegression.OLS( inddataNF, depdataNF ); MyModel := MyOLS.GetModel; SASsy.Utils.reg_report_on_all( MyOLS, MyModel, inddataNF );
  24. 24. More ECL Machine Learning Library • Statistics (e.g. Means, Std Deviation, Modes, Medians, NTiles, etc.) • Regression • Clustering (e.g. K-Means) • Classification (e.g. Logistic Regression, Decision Trees, Perceptron, etc.) • Unstructured Data (Tokenize, Transform, CoLocation) • Association (e.g. AprioriN) • Matrix Manipulation HPCC Systems vs SAS: The Final Countdown 24
  25. 25. Today HPCC Systems used to process data at scale and on a more frequent basis • Process Medical Claims using Thor and deliver results using Roxie • Run ETL/ELT processes to load, clean, prepare data • Run more advanced processing to generate outputs (Bundle Engine) • Clusters of 8+ nodes SAS used to run research, exploratory data analysis and modeling. • Uses HPCC outputs as input • Single instance • Restricted on CPU/RAM 25HPCC Systems vs SAS: The Final Countdown
  26. 26. Tomorrow HPCC Systems • Still run ETL/ELT processes to load, clean, prepare data • Run processes that need to happen more frequently • Porting more Advanced Data Analysis And Modeling features to ECL • Make it easier to create clusters to make experimentation effortless SAS • 1 server • R&D for now • Validate/compare results with HPCC Systems 26HPCC Systems vs SAS: The Final Countdown
  27. 27. Thank you!OUTPUT(‘ ’);

×