SlideShare a Scribd company logo
1 of 51
Download to read offline
1
From Mining To Analytics
Making Sense of Medicare Data
2
Tripfilms.com
3
The Achievement Network
4
Archway Health Advisors
Medicare
• Centers for Medicare & Medicaid Services (CMS)
• Medicare is a national social insurance program since
1966 covering Americans aged 65 and older.
• “Fee For Service” Model
5
Medicare: Fee for Service
6
Hospital SNF
CMS
ClaimClaimClaimClaims
ClaimClaimClaimClaims
ClaimClaimClaimClaims
ClaimClaimClaimClaims
Home HealthHospital
(Readmit)
3 days 18 days 2 days 12 visits
$3,000 $12,000 $5,000 $4,000
= $24,000
Bundled Payments for Care
• Better care, smarter spending, and healthier people
• The Bundled Payments for Care Improvement (BPCI)
Payment arrangements
Based on financial and performance
• 4 Models:
Model 1: Retrospective Acute Care Hospital Stay Only
Model 2: Retrospective Acute Care Hospital Stay And Post-
acute care
Model 3: Retrospective Post-Acute Care Only
Model 4: Acute Care Hospital Stay Only
7
Medicare: BPCI
8
Hospital SNF
CMS
ClaimClaimClaimClaims
ClaimClaimClaimClaims
Home HealthHospital
(Readmit)
3 days 18 days
Episode
$20,000
$20,000 - ($3,000 + $12,000) = $5,000
$3,000 $12,000
Claims
• Statement of services and costs from a healthcare
provider
Patient information
Diagnosis information
Procedure(s) information
• Types:
Hospital (Inpatient) Claims
Skilled Nursing Facility (SNF) Claims
Home Health Agency (HHA) Claims
…
9
10
Episode of Care
SNF #1
SNF #2
IP #1
IP #2
SNF #2
IP #3
HHA #1
HHA #2
time
Mining Claims
11
IP SNF HHAIP IP IP IP SNF SNF
IP SNF HHAIP IP IP IP SNF SNF
IP SNF HHAIP IP IP IP SNF SNF
IP SNF HHAIP IP IP IP SNF SNF
IP SNF HHAIP IP IP IP SNF SNF
IP
IP
IP
IP
IP
Hospital
Stays
Post-Acute
Care
Anchor
Episode
Initiator
Technologies considered
• Pig (& Hadoop)
Data Processing Language
Procedural Language
Relational-oriented
• SAS
Business Analytics & BI Software
De-facto standard in Healthcare Industry
Proprietary
12
HPCC Systems
Quick Introduction
Rodrigo Pastrana -Consulting Software Engineer
WHT/082311
What is HPCC Systems
14
• Open Source distributed data-intensive computing platform
• Provides end-to-end Big Data workflow management , scheduler,
integration tools, etc
• Runs on commodity computing/storage nodes
• Binary packages available for the most common Linux distributions
• Originally designed circa 1999 (predates the original paper on
MapReduce from Dec. ‘04)
• Improved over a decade of real-world Big Data analytics
• In use across critical production environments throughout
LexisNexis for more than 10 years
WHT/082311
The HPCC Systems platform
15
WHT/082311
• Massively Parallel data processing engine
• Enables data integration on a scale not previously available
• Programmable using ECL
HPCC Systems Data Refinery (Thor)
HPCC Systems Data Delivery Engine (Roxie) • A massively parallel, high throughput, query engine
• Low latency, highly concurrent and highly available
• Several advanced strategies for efficient retrieval
• Programmable using ECL
Enterprise Control Language (ECL) • An easy to use, declarative data-centric programming
language optimized for large-scale data management
and query processing
• Highly efficient; automatically distributes workload
across all nodes; compiles to native machine code.
• Automatic parallelization and synchronization
1
2
3
The Three HPCC Systems components
Conclusion: End to End platform • No need for any third party tools
16
WHT/082311
• Declarative programming language: Describe what needs
to be done and not how to do it
• Powerful: High level data activities like JOIN, TRANSFORM,
PROJECT, SORT, DISTRIBUTE, MAP, etc. are available.
• Extensible: Modular and extensible, it can shape itself to
adapt to the type of problem at hand
• Implicitly parallel: Parallelism is built into the underlying
platform. The programmer needs not be concerned with
data partitioning and parallelism
• Maintainable: High level programming language, without
side effects and with efficient encapsulation; programs are
more succinct, reliable and easier to troubleshoot
• Complete: ECL provides a complete data programming
paradigm
• Homogeneous: One language to express data algorithms
across the entire HPCC Systems platform: data integration,
analytics and high speed delivery
• Polyglottic: ECL supports the embedding of other
languages such as Java, Python, R, SQL, and more
Enterprise Control Language (ECL)
17
WHT/082311
Current Status and Resources
• HPCCSystems.com – Tutorials, Docs, Platform distributions, and more
• Latest release 5.2.0 adds many new features and improvements
• Drastic GUI improvements
• Ganglia and Nagios plug-in for system monitoring and alerting
• Security Enhancements – tighter authentication measures, intra-
component communication encryption
• Embedded Languages – Cassandra support, memcache and redis
access
• JSON based data support
• Dynamic ESDL – Provides simple middleware/back-end interface
definition
• JAVA API project – facilitates interaction between Java based apps and
HPCC web services and c++ tools
• Available now – HPCCSystems.com
Data mining with HPCC Systems
• Thor
Responsible for processing vast amount of data
Optimized for Extraction, Transformation, Loading,
Sorting and Linking Data
• ECL
Declarative
More Data Centric
Fast & Implicitly Parallel
Inline data
Unit Tests in ECL
19
20
SQL vs ECL
SELECT
diag_group_cd,
COUNT(*) as volume
SUM(pmt_amt) as costs
FROM
inpatient_claims
GROUP BY
diag_group_cd;
TABLE(
inpatient_claims,
{
diag_group_cd;
INTEGER volume :=
COUNT(GROUP);
REAL costs := SUM(pmt_amt);
},
diag_group_cd
);
SQL ECL
SELECT
*
FROM
inpatient_claims LEFT
JOIN ip_value_codes
RIGHT
ON LEFT.id = RIGHT.id
JOIN(
inpatient_claims,
ip_value_codes,
LEFT.id = RIGHT.id
);
21
SQL vs ECL
DECLARE my_cursor CURSOR FOR
SELECT * FROM inpatient_claims;
OPEN my_cursor
FETCH NEXT FROM my_cursor
INTO @…, @…
WHILE @@FETCH_STATUS = 0
BEGIN
…
END
CLOSE my_cursor;
DEALLOCATE my_cursor;
ITERATE(
inpatient_claims,
TRANSFORM(inpatient_claim_layout,
SELF.is_dropped :=
is_one_year_or_greater(
RIGHT.admsn_dt,
RIGHT.dschrgdt);
SELF := RIGHT;
)
);
SQL ECL
Tx
22
ECL ROLLUP
R1 R2 R3 R4 R5 R6
LEFT RIGHT
RA
TxLEFT RIGHT
RB R4 R6R5
ROLLUP( dataset, condition(LEFT, RIGHT), transformation(LEFT, RIGHT) )
Processing Claims
1. The intent here is to make the series of interim claims look like a single claim for
most purposes, where the admission date of the first claim becomes the
admission date of the whole claim and the discharge date of the last claim in the
series becomes the discharge date of the whole claim.
2. 􏰂􏰂The admission date from the first series in the claim and the discharge date
from the last series in the claim define the length of the stay.
3. 􏰂􏰂The MS-DRG from the last claim in the single stay (the discharge MS-DRG)
determines whether the hospital stay becomes an anchor record, or whether the
stay is included/excluded as a readmission for an existing episode.
4. 􏰂􏰂Costs across all IP claims included in the single stay are aggregated to the stay
level.
5. Claims where the last in the series of claims has patient (…) [as “still a patient”,
not discharged], flag these and drop all of the claims in the series from the IP
hospital stay file.
23
Processing Claims With ECL
H_1 := SORT( A , bene_sk, provider, admsn_dt, dschrgdt, thru_dt);
H_2 := ROLLUP(H_1,
is_interim(LEFT, RIGHT),
merge_interim_claims(LEFT, RIGHT));
H_3 := JOIN(H_2, H_1, LEFT.bene_sk = RIGHT.bene_sk […], RIGHT
ONLY);
H_4 := PROJECT(H_3, TRANSFORM(BPCI.Layouts.ip_claim_etl_layout,
SELF.is_dropped := TRUE;
SELF.dropped_reason_code :=
BPCI.Layouts.DROPPED_REASON_CODES.InterimClaim;
SELF := LEFT;
));
H := H_2 + H_4;
24
25
Template Language
EXPORT load_all_client_files(pId, pFileSet, pBaseDataDirectory) := MACRO
LOADXML(pFileSet);
baseDataDirectory := pBaseDataDirectory + pId + '/';
#FOR(folder)
#UNIQUENAME(subId)
%subId% := %''%;
#UNIQUENAME(subDS)
%subDS% := Client.Datasets(%subId%);
[...]
#UNIQUENAME(id)
%id% := pId + '::' + %''%;
#UNIQUENAME(dataDir)
%dataDir% := %baseDataDirectory% + %''% + '/';
#UNIQUENAME(etl)
%etl% := Client.ETL(%dataDir%, %id%);
%etl%.run();
#END
ENDMACRO;
26
Template Language
file_set := ’<folders>’ +
'<folder>M201409</folder>' +
'<folder>M201410</folder>' +
'<folder>M201411</folder>' +
'<folder>M201412</folder>' +
'<folder>M201501</folder>' +
'<folder>M201502</folder>' +
'<folder>M201503</folder>' +
‘</folders>’;
load_all_client_files(1234, file_set, ‘/volume1/data/‘);
Beyond Processing Data
• Security & Authentication
• Collaboration
• Unit Tests
• Visualizations
27
Beyond Processing: Security
• HTTPS
• Htpasswd
• LDAP support
• File level security when using LDAP
28
Beyond Processing: Workunits
• Workunit Identifier
• Attribution
• Query
• Timings
• Results
29
30
Beyond Processing: Collaboration
31
Beyond Processing: Collaboration (2)
32
Beyond Processing: Collaboration
33
Beyond Processing: Collaboration
34
Beyond Processing: Unit Tests
interim_claims := MODULE
// Test Data
test_set :=
BPCI.Test.Samples.ip_claim(
bene_id := 1, claim_id := 1, pmt_amt := 3042.0, ...)
+ BPCI.Test.Samples.ip_claim(
bene_id := 1, claim_id := 2, pmt_amt := 11409.0, ...)
+ ....
;
...
EXPORT Actual := Step2.ip_stays;
SHARED TestSuite := MODULE
EXPORT Test01 := ASSERT(oActual(NOT is_dropped), claimno IN [1,2], 'Did not filter ou
EXPORT Test02 := ASSERT(oActual(is_dropped), claimno IN [3,5]);
END;
EXPORT AllTests := TestSuite.Test01 + TestSuite.Test02;
END;
Beyond Processing: Unit Tests (2)
35
// Using inline dataset
simple_ip_claims := DATASET([
{1,1,'0','010001',20000201,'',20000120,200
00201,'61'},
], simplified_ip_layout);
ip_claims :=
Samples.ip_claims(simple_ip_claims);
// OR passing NAMED parameters
ip_claims := Samples.ip_claims2(
bene_id := 1,
claim_id := 1,
claim_type := '00'
)
simplified_ip_layout := RECORD
UNSIGNED bene_id;
UNSIGNED claim_id;
STRING claim_type;
STRING provider_number;
INTEGER4 through_date;
STRING status_code;
INTEGER4 admission_date;
INTEGER4 discharge_date;
STRING ms_drg_code;
END;
36
Beyond Processing: Visualization
37
Custom Visualization
38
(No) Insights
0
20
40
60
80
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Episodes
Costs in $1,000
39
Insights
0
20
40
60
80
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Episodes
Costs in $1,000
No readmit 1 Readmit 2 Readmits
Data Delivery: Roxie
• Data Delivery Engine
• Indexed, compressed and in-memory
• Data Warehouse Capabilities
• Data Services
40
Data Services
• Web Services over Data Warehouse
• XML/SOAP but also JSON
• Web Services defined in ECL
• Solution to add/remove data from the cluster
41
Data Service: Query
• Define service using ECL
42
INTEGER4 oOffset := 1 : STORED('Offset');
INTEGER4 oResults := 100 : STORED('Results');
INTEGER4 oStartDate := 20130201 : STORED('Begin_Date');
INTEGER4 oEndDate := 20140201 : STORED('End_Date');
...
oParams := DATASET([{ oOffset, oResults, oStartDate, oEndDate, … }],
Layouts.service_parameters_layout);
T(DATASET(RECORDOF(Datasets.dsFactEpisodeCostsIndex)) pData) := FUNCTION
RETURN TABLE(pData,
{
STRING bpid := pData.bpid;
UNSIGNED INTEGER1 model := pData.model;
UNSIGNED INTEGER1 post_dsch_prd_length := pData.post_dsch_prd_length;
INTEGER8 total_episodes := COUNT(GROUP);
DECIMAL15_2 total_costs := SUM(GROUP, sum_post_dsch_prd_pay);
DECIMAL15_2 average_costs := AVE(GROUP, sum_post_dsch_prd_pay);
DECIMAL15_2 std_dev_costs := SQRT(VARIANCE(GROUP, sum_post_dsch_prd_pay));
}, bpid, model, post_dsch_prd_length);
END;
ReportServices.BaseService.run_it('Summary', oParameters, T, bpid);
43
Data (Web) Services
{
"summary": {
"offset": 1,
"results": 10,
"begin_date": 20130101,
"end_date": 20130201,
…
}
}
{
"summaryResponse": {
…
"Results": {
…
"Summary": {
"Row": [
{
"bpid": "9999",
"model": 2,
"post_dsch_prd_length": 90,
"total_episodes": 987,
"total_costs": 9876543.21,
"average_costs": 12358.13
…
}
]
…
Request Response
https://.../WsEcl/forms/json/query/roxie/summary
44
Data Services: WsECL
45
Data Services: WsECL
Loading up data
• Logical vs Physical
~abc::subfolder::subsubfolder::myfile
/abc/subfolder/subsubfolder/myfile
• ECL to load data into cluster:
46
oDS := DATASET(
std.File.ExternalLogicalFilename('172.0.0.1','/var/lib/.../myfile.csv'),
Layouts.ip_claim_layout,
CSV(HEADING(0)) );
oDSDistributed := DISTRIBUTE(oDS, bene_id);
OUTPUT(oDSDistributed,, ‘~somewhere::over::here::myfile’, OVERWRITE);
oDS := DATASET(‘~somewhere::over::here::myfile’, Layouts.ip_claim_layout);
• ECL to use data loaded into cluster:
SuperFiles
• Super File = Symbolic link, list of sub-files
• Each sub-file must have the same layout
47
WEBLOGS_FILE := ‘~somewhere::logs::web’
Std.File.CreateSuperFile(WEBLOGS_FILE);
…
run_report() := FUNCTION
oDS := DATASET(WEBLOGS_FILE, Layouts.weblogs_layout, CSV);
RETURN TABLE(oDS, { ip_address; COUNT(GROUP); }, ip_address );
END;
SEQUENTIAL(
Std.File.StartSuperFileTransaction(),
Std.File.AddSuperFile(WEBLOGS_FILE, ‘~somewhere::logs::web::2015::04::01’),
Std.File.FinishSuperFileTransaction()
);
• Including (more) data:
48
Data Services: Reusability
EXPORT run_it( pServiceName, pParams, pReportFunction, pSortByField) := MACRO
// Filtering data based on parameters
#UNIQUENAME(DS);
%DS% := WS.Datasets.dsFactEpisodeCostsIndex;
[…]
#UNIQUENAME(B)
%B% := IF(COUNT(pParams[1].providers) = 0, %A%, %A%(provider_id IN pParams[1].providers));
#UNIQUENAME(C)
%C% := IF(COUNT(pParams[1].npis) = 0, %B%, %B%(at_npi IN pParams[1].npis OR op_npi IN pParams[
[…]
#UNIQUENAME(report)
%report% := pReportFunction(%K%);
#UNIQUENAME(sorted)
%sorted% := SORT(%report%, pSortByField);
#UNIQUENAME(O1)
%O1% := OUTPUT(pParameters, NAMED('Request'));
oSummary := DATASET([{ COUNT(%sorted%) }], WS.Layouts.service_summary_layout);
#UNIQUENAME(O2)
%O2% := OUTPUT(oSummary, NAMED(‘Metadata'));
#UNIQUENAME(O3)
%O3% := OUTPUT(
CHOOSEN(%sorted%, pParams[1].results, pParams[1].offset),
NAMED(pServiceName), ALL);
PARALLEL(%O1%, %O2%, %O3%);
ENDMACRO;
49
AHA System Architecture
50
Archway Analytics
51
lpezet@archwayha.com
www.linkedin.com/in/lucpezet
mezzetin.blogspot.com
HPCC Systems open source portal:
http://hpccsystems.com
Thank you
Questions? Feedback?
Questions ? Feedback ?
www.linkedin.com/in/lucpezet

More Related Content

Viewers also liked

Khóa học lập trình ios
Khóa học lập trình iosKhóa học lập trình ios
Khóa học lập trình iosĐào tạo Seo
 
Cisco and SUSE Linux: The perfect platform for SAP
Cisco and SUSE Linux: The perfect platform for SAPCisco and SUSE Linux: The perfect platform for SAP
Cisco and SUSE Linux: The perfect platform for SAPDirk Oppenkowski
 
Supply7 overview
Supply7   overviewSupply7   overview
Supply7 overviewSUPPLY7
 
Moz public invst-agriculture
Moz public invst-agricultureMoz public invst-agriculture
Moz public invst-agricultureIFPRI-Maputo
 
Challenges and opportunities of the Mexican Space Agency
Challenges and opportunities of the Mexican Space Agency Challenges and opportunities of the Mexican Space Agency
Challenges and opportunities of the Mexican Space Agency Carlos Duarte
 
E2 d3 detailed description
E2 d3 detailed descriptionE2 d3 detailed description
E2 d3 detailed descriptionhipsrinoky
 
Html workshop 1
Html workshop 1Html workshop 1
Html workshop 1Lee Scott
 
Character Profile
Character ProfileCharacter Profile
Character Profilesmdoyle
 
Dokumen standard bahasa tamil sk tahap 1
Dokumen standard bahasa tamil sk tahap 1Dokumen standard bahasa tamil sk tahap 1
Dokumen standard bahasa tamil sk tahap 1Kartikaa Jeyanthan
 
Advocates and Activities
Advocates and ActivitiesAdvocates and Activities
Advocates and ActivitiesJeanneMacAdams
 
Traumatic brain injury
Traumatic brain injuryTraumatic brain injury
Traumatic brain injurycaitjoh
 
Teaser Trailer Analysis
Teaser Trailer AnalysisTeaser Trailer Analysis
Teaser Trailer Analysissmdoyle
 

Viewers also liked (17)

Khóa học lập trình ios
Khóa học lập trình iosKhóa học lập trình ios
Khóa học lập trình ios
 
Cisco and SUSE Linux: The perfect platform for SAP
Cisco and SUSE Linux: The perfect platform for SAPCisco and SUSE Linux: The perfect platform for SAP
Cisco and SUSE Linux: The perfect platform for SAP
 
Supply7 overview
Supply7   overviewSupply7   overview
Supply7 overview
 
Portafolio clau (1)
Portafolio clau (1)Portafolio clau (1)
Portafolio clau (1)
 
Moz public invst-agriculture
Moz public invst-agricultureMoz public invst-agriculture
Moz public invst-agriculture
 
Challenges and opportunities of the Mexican Space Agency
Challenges and opportunities of the Mexican Space Agency Challenges and opportunities of the Mexican Space Agency
Challenges and opportunities of the Mexican Space Agency
 
By CMS COMPUTER
By CMS COMPUTERBy CMS COMPUTER
By CMS COMPUTER
 
E2 d3 detailed description
E2 d3 detailed descriptionE2 d3 detailed description
E2 d3 detailed description
 
Storyboards
StoryboardsStoryboards
Storyboards
 
Isus
IsusIsus
Isus
 
Html workshop 1
Html workshop 1Html workshop 1
Html workshop 1
 
Character Profile
Character ProfileCharacter Profile
Character Profile
 
Dokumen standard bahasa tamil sk tahap 1
Dokumen standard bahasa tamil sk tahap 1Dokumen standard bahasa tamil sk tahap 1
Dokumen standard bahasa tamil sk tahap 1
 
Advocates and Activities
Advocates and ActivitiesAdvocates and Activities
Advocates and Activities
 
Traumatic brain injury
Traumatic brain injuryTraumatic brain injury
Traumatic brain injury
 
User guide
User guideUser guide
User guide
 
Teaser Trailer Analysis
Teaser Trailer AnalysisTeaser Trailer Analysis
Teaser Trailer Analysis
 

Similar to Medicare Claims Data Mining

Presentation at Wright State University
Presentation at Wright State UniversityPresentation at Wright State University
Presentation at Wright State UniversityHPCC Systems
 
Meetup: Big Data NLP with HPCC Systems® - A Development Ride from Spray to TH...
Meetup: Big Data NLP with HPCC Systems® - A Development Ride from Spray to TH...Meetup: Big Data NLP with HPCC Systems® - A Development Ride from Spray to TH...
Meetup: Big Data NLP with HPCC Systems® - A Development Ride from Spray to TH...HPCC Systems
 
Flexible EDI Solutions for the SMB Market
Flexible EDI Solutions for the SMB MarketFlexible EDI Solutions for the SMB Market
Flexible EDI Solutions for the SMB MarketZach Gardner
 
Hive 3 - a new horizon
Hive 3 - a new horizonHive 3 - a new horizon
Hive 3 - a new horizonThejas Nair
 
Dell High-Performance Computing solutions: Enable innovations, outperform exp...
Dell High-Performance Computing solutions: Enable innovations, outperform exp...Dell High-Performance Computing solutions: Enable innovations, outperform exp...
Dell High-Performance Computing solutions: Enable innovations, outperform exp...Dell World
 
HL7 Survival Guide - Chapter 3 - The Heart of the Matter: Data Formats, Workf...
HL7 Survival Guide - Chapter 3 - The Heart of the Matter: Data Formats, Workf...HL7 Survival Guide - Chapter 3 - The Heart of the Matter: Data Formats, Workf...
HL7 Survival Guide - Chapter 3 - The Heart of the Matter: Data Formats, Workf...Caristix
 
HPCC Systems Presentation to TDWI Chicago Chapter
HPCC Systems Presentation to TDWI Chicago ChapterHPCC Systems Presentation to TDWI Chicago Chapter
HPCC Systems Presentation to TDWI Chicago ChapterHPCC Systems
 
Naman_Abinitio_7757021406
Naman_Abinitio_7757021406Naman_Abinitio_7757021406
Naman_Abinitio_7757021406Naman Gupta
 
: HL7 Survival Guide - Chapter 7 – Gap Analysis
: HL7 Survival Guide - Chapter 7 – Gap Analysis: HL7 Survival Guide - Chapter 7 – Gap Analysis
: HL7 Survival Guide - Chapter 7 – Gap AnalysisCaristix
 
Big Data Processing Beyond MapReduce by Dr. Flavio Villanustre
Big Data Processing Beyond MapReduce by Dr. Flavio VillanustreBig Data Processing Beyond MapReduce by Dr. Flavio Villanustre
Big Data Processing Beyond MapReduce by Dr. Flavio VillanustreHPCC Systems
 
Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?DataWorks Summit
 
Towards the Implementation of an openEHR-based Open Source EHR Platform (a vi...
Towards the Implementation of an openEHR-based Open Source EHR Platform (a vi...Towards the Implementation of an openEHR-based Open Source EHR Platform (a vi...
Towards the Implementation of an openEHR-based Open Source EHR Platform (a vi...Pablo Pazos
 
HPC Advisory Council – Stanford Conference 2018
HPC Advisory Council – Stanford Conference 2018HPC Advisory Council – Stanford Conference 2018
HPC Advisory Council – Stanford Conference 2018Thomas Francis
 
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016MLconf
 

Similar to Medicare Claims Data Mining (20)

Presentation at Wright State University
Presentation at Wright State UniversityPresentation at Wright State University
Presentation at Wright State University
 
Meetup: Big Data NLP with HPCC Systems® - A Development Ride from Spray to TH...
Meetup: Big Data NLP with HPCC Systems® - A Development Ride from Spray to TH...Meetup: Big Data NLP with HPCC Systems® - A Development Ride from Spray to TH...
Meetup: Big Data NLP with HPCC Systems® - A Development Ride from Spray to TH...
 
Richard p fhir
Richard p fhirRichard p fhir
Richard p fhir
 
Richard p fhir
Richard p fhirRichard p fhir
Richard p fhir
 
Flexible EDI Solutions for the SMB Market
Flexible EDI Solutions for the SMB MarketFlexible EDI Solutions for the SMB Market
Flexible EDI Solutions for the SMB Market
 
Hive 3 a new horizon
Hive 3  a new horizonHive 3  a new horizon
Hive 3 a new horizon
 
MICHAEL SHEFFER ETL CA
MICHAEL SHEFFER ETL CAMICHAEL SHEFFER ETL CA
MICHAEL SHEFFER ETL CA
 
richard_p_Integration
richard_p_Integrationrichard_p_Integration
richard_p_Integration
 
Hive 3 - a new horizon
Hive 3 - a new horizonHive 3 - a new horizon
Hive 3 - a new horizon
 
Dell High-Performance Computing solutions: Enable innovations, outperform exp...
Dell High-Performance Computing solutions: Enable innovations, outperform exp...Dell High-Performance Computing solutions: Enable innovations, outperform exp...
Dell High-Performance Computing solutions: Enable innovations, outperform exp...
 
HL7 Survival Guide - Chapter 3 - The Heart of the Matter: Data Formats, Workf...
HL7 Survival Guide - Chapter 3 - The Heart of the Matter: Data Formats, Workf...HL7 Survival Guide - Chapter 3 - The Heart of the Matter: Data Formats, Workf...
HL7 Survival Guide - Chapter 3 - The Heart of the Matter: Data Formats, Workf...
 
HPCC Systems Presentation to TDWI Chicago Chapter
HPCC Systems Presentation to TDWI Chicago ChapterHPCC Systems Presentation to TDWI Chicago Chapter
HPCC Systems Presentation to TDWI Chicago Chapter
 
Naman_Abinitio_7757021406
Naman_Abinitio_7757021406Naman_Abinitio_7757021406
Naman_Abinitio_7757021406
 
: HL7 Survival Guide - Chapter 7 – Gap Analysis
: HL7 Survival Guide - Chapter 7 – Gap Analysis: HL7 Survival Guide - Chapter 7 – Gap Analysis
: HL7 Survival Guide - Chapter 7 – Gap Analysis
 
Big Data Processing Beyond MapReduce by Dr. Flavio Villanustre
Big Data Processing Beyond MapReduce by Dr. Flavio VillanustreBig Data Processing Beyond MapReduce by Dr. Flavio Villanustre
Big Data Processing Beyond MapReduce by Dr. Flavio Villanustre
 
Naresh
NareshNaresh
Naresh
 
Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?
 
Towards the Implementation of an openEHR-based Open Source EHR Platform (a vi...
Towards the Implementation of an openEHR-based Open Source EHR Platform (a vi...Towards the Implementation of an openEHR-based Open Source EHR Platform (a vi...
Towards the Implementation of an openEHR-based Open Source EHR Platform (a vi...
 
HPC Advisory Council – Stanford Conference 2018
HPC Advisory Council – Stanford Conference 2018HPC Advisory Council – Stanford Conference 2018
HPC Advisory Council – Stanford Conference 2018
 
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
 

More from HPCC Systems

Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...HPCC Systems
 
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsImproving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsHPCC Systems
 
Towards Trustable AI for Complex Systems
Towards Trustable AI for Complex SystemsTowards Trustable AI for Complex Systems
Towards Trustable AI for Complex SystemsHPCC Systems
 
Closing / Adjourn
Closing / Adjourn Closing / Adjourn
Closing / Adjourn HPCC Systems
 
Community Website: Virtual Ribbon Cutting
Community Website: Virtual Ribbon CuttingCommunity Website: Virtual Ribbon Cutting
Community Website: Virtual Ribbon CuttingHPCC Systems
 
Release Cycle Changes
Release Cycle ChangesRelease Cycle Changes
Release Cycle ChangesHPCC Systems
 
Geohashing with Uber’s H3 Geospatial Index
Geohashing with Uber’s H3 Geospatial Index Geohashing with Uber’s H3 Geospatial Index
Geohashing with Uber’s H3 Geospatial Index HPCC Systems
 
Advancements in HPCC Systems Machine Learning
Advancements in HPCC Systems Machine LearningAdvancements in HPCC Systems Machine Learning
Advancements in HPCC Systems Machine LearningHPCC Systems
 
Expanding HPCC Systems Deep Neural Network Capabilities
Expanding HPCC Systems Deep Neural Network CapabilitiesExpanding HPCC Systems Deep Neural Network Capabilities
Expanding HPCC Systems Deep Neural Network CapabilitiesHPCC Systems
 
Leveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC SystemsLeveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC SystemsHPCC Systems
 
DataPatterns - Profiling in ECL Watch
DataPatterns - Profiling in ECL Watch DataPatterns - Profiling in ECL Watch
DataPatterns - Profiling in ECL Watch HPCC Systems
 
Leveraging the Spark-HPCC Ecosystem
Leveraging the Spark-HPCC Ecosystem Leveraging the Spark-HPCC Ecosystem
Leveraging the Spark-HPCC Ecosystem HPCC Systems
 
Work Unit Analysis Tool
Work Unit Analysis ToolWork Unit Analysis Tool
Work Unit Analysis ToolHPCC Systems
 
Community Award Ceremony
Community Award Ceremony Community Award Ceremony
Community Award Ceremony HPCC Systems
 
Dapper Tool - A Bundle to Make your ECL Neater
Dapper Tool - A Bundle to Make your ECL NeaterDapper Tool - A Bundle to Make your ECL Neater
Dapper Tool - A Bundle to Make your ECL NeaterHPCC Systems
 
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...HPCC Systems
 
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...HPCC Systems
 

More from HPCC Systems (20)

Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...
 
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsImproving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
 
Towards Trustable AI for Complex Systems
Towards Trustable AI for Complex SystemsTowards Trustable AI for Complex Systems
Towards Trustable AI for Complex Systems
 
Welcome
WelcomeWelcome
Welcome
 
Closing / Adjourn
Closing / Adjourn Closing / Adjourn
Closing / Adjourn
 
Community Website: Virtual Ribbon Cutting
Community Website: Virtual Ribbon CuttingCommunity Website: Virtual Ribbon Cutting
Community Website: Virtual Ribbon Cutting
 
Path to 8.0
Path to 8.0 Path to 8.0
Path to 8.0
 
Release Cycle Changes
Release Cycle ChangesRelease Cycle Changes
Release Cycle Changes
 
Geohashing with Uber’s H3 Geospatial Index
Geohashing with Uber’s H3 Geospatial Index Geohashing with Uber’s H3 Geospatial Index
Geohashing with Uber’s H3 Geospatial Index
 
Advancements in HPCC Systems Machine Learning
Advancements in HPCC Systems Machine LearningAdvancements in HPCC Systems Machine Learning
Advancements in HPCC Systems Machine Learning
 
Docker Support
Docker Support Docker Support
Docker Support
 
Expanding HPCC Systems Deep Neural Network Capabilities
Expanding HPCC Systems Deep Neural Network CapabilitiesExpanding HPCC Systems Deep Neural Network Capabilities
Expanding HPCC Systems Deep Neural Network Capabilities
 
Leveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC SystemsLeveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC Systems
 
DataPatterns - Profiling in ECL Watch
DataPatterns - Profiling in ECL Watch DataPatterns - Profiling in ECL Watch
DataPatterns - Profiling in ECL Watch
 
Leveraging the Spark-HPCC Ecosystem
Leveraging the Spark-HPCC Ecosystem Leveraging the Spark-HPCC Ecosystem
Leveraging the Spark-HPCC Ecosystem
 
Work Unit Analysis Tool
Work Unit Analysis ToolWork Unit Analysis Tool
Work Unit Analysis Tool
 
Community Award Ceremony
Community Award Ceremony Community Award Ceremony
Community Award Ceremony
 
Dapper Tool - A Bundle to Make your ECL Neater
Dapper Tool - A Bundle to Make your ECL NeaterDapper Tool - A Bundle to Make your ECL Neater
Dapper Tool - A Bundle to Make your ECL Neater
 
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
 
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
 

Recently uploaded

convolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfconvolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfSubhamKumar3239
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
INTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingINTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingsocarem879
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxHimangsuNath
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataTecnoIncentive
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfblazblazml
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data VisualizationKianJazayeri1
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxTasha Penwell
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxHaritikaChhatwal1
 

Recently uploaded (20)

convolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfconvolutional neural network and its applications.pdf
convolutional neural network and its applications.pdf
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
INTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingINTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processing
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptx
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded data
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data Visualization
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptx
 

Medicare Claims Data Mining

  • 1. 1 From Mining To Analytics Making Sense of Medicare Data
  • 5. Medicare • Centers for Medicare & Medicaid Services (CMS) • Medicare is a national social insurance program since 1966 covering Americans aged 65 and older. • “Fee For Service” Model 5
  • 6. Medicare: Fee for Service 6 Hospital SNF CMS ClaimClaimClaimClaims ClaimClaimClaimClaims ClaimClaimClaimClaims ClaimClaimClaimClaims Home HealthHospital (Readmit) 3 days 18 days 2 days 12 visits $3,000 $12,000 $5,000 $4,000 = $24,000
  • 7. Bundled Payments for Care • Better care, smarter spending, and healthier people • The Bundled Payments for Care Improvement (BPCI) Payment arrangements Based on financial and performance • 4 Models: Model 1: Retrospective Acute Care Hospital Stay Only Model 2: Retrospective Acute Care Hospital Stay And Post- acute care Model 3: Retrospective Post-Acute Care Only Model 4: Acute Care Hospital Stay Only 7
  • 8. Medicare: BPCI 8 Hospital SNF CMS ClaimClaimClaimClaims ClaimClaimClaimClaims Home HealthHospital (Readmit) 3 days 18 days Episode $20,000 $20,000 - ($3,000 + $12,000) = $5,000 $3,000 $12,000
  • 9. Claims • Statement of services and costs from a healthcare provider Patient information Diagnosis information Procedure(s) information • Types: Hospital (Inpatient) Claims Skilled Nursing Facility (SNF) Claims Home Health Agency (HHA) Claims … 9
  • 10. 10 Episode of Care SNF #1 SNF #2 IP #1 IP #2 SNF #2 IP #3 HHA #1 HHA #2 time
  • 11. Mining Claims 11 IP SNF HHAIP IP IP IP SNF SNF IP SNF HHAIP IP IP IP SNF SNF IP SNF HHAIP IP IP IP SNF SNF IP SNF HHAIP IP IP IP SNF SNF IP SNF HHAIP IP IP IP SNF SNF IP IP IP IP IP Hospital Stays Post-Acute Care Anchor Episode Initiator
  • 12. Technologies considered • Pig (& Hadoop) Data Processing Language Procedural Language Relational-oriented • SAS Business Analytics & BI Software De-facto standard in Healthcare Industry Proprietary 12
  • 13. HPCC Systems Quick Introduction Rodrigo Pastrana -Consulting Software Engineer
  • 14. WHT/082311 What is HPCC Systems 14 • Open Source distributed data-intensive computing platform • Provides end-to-end Big Data workflow management , scheduler, integration tools, etc • Runs on commodity computing/storage nodes • Binary packages available for the most common Linux distributions • Originally designed circa 1999 (predates the original paper on MapReduce from Dec. ‘04) • Improved over a decade of real-world Big Data analytics • In use across critical production environments throughout LexisNexis for more than 10 years
  • 16. WHT/082311 • Massively Parallel data processing engine • Enables data integration on a scale not previously available • Programmable using ECL HPCC Systems Data Refinery (Thor) HPCC Systems Data Delivery Engine (Roxie) • A massively parallel, high throughput, query engine • Low latency, highly concurrent and highly available • Several advanced strategies for efficient retrieval • Programmable using ECL Enterprise Control Language (ECL) • An easy to use, declarative data-centric programming language optimized for large-scale data management and query processing • Highly efficient; automatically distributes workload across all nodes; compiles to native machine code. • Automatic parallelization and synchronization 1 2 3 The Three HPCC Systems components Conclusion: End to End platform • No need for any third party tools 16
  • 17. WHT/082311 • Declarative programming language: Describe what needs to be done and not how to do it • Powerful: High level data activities like JOIN, TRANSFORM, PROJECT, SORT, DISTRIBUTE, MAP, etc. are available. • Extensible: Modular and extensible, it can shape itself to adapt to the type of problem at hand • Implicitly parallel: Parallelism is built into the underlying platform. The programmer needs not be concerned with data partitioning and parallelism • Maintainable: High level programming language, without side effects and with efficient encapsulation; programs are more succinct, reliable and easier to troubleshoot • Complete: ECL provides a complete data programming paradigm • Homogeneous: One language to express data algorithms across the entire HPCC Systems platform: data integration, analytics and high speed delivery • Polyglottic: ECL supports the embedding of other languages such as Java, Python, R, SQL, and more Enterprise Control Language (ECL) 17
  • 18. WHT/082311 Current Status and Resources • HPCCSystems.com – Tutorials, Docs, Platform distributions, and more • Latest release 5.2.0 adds many new features and improvements • Drastic GUI improvements • Ganglia and Nagios plug-in for system monitoring and alerting • Security Enhancements – tighter authentication measures, intra- component communication encryption • Embedded Languages – Cassandra support, memcache and redis access • JSON based data support • Dynamic ESDL – Provides simple middleware/back-end interface definition • JAVA API project – facilitates interaction between Java based apps and HPCC web services and c++ tools • Available now – HPCCSystems.com
  • 19. Data mining with HPCC Systems • Thor Responsible for processing vast amount of data Optimized for Extraction, Transformation, Loading, Sorting and Linking Data • ECL Declarative More Data Centric Fast & Implicitly Parallel Inline data Unit Tests in ECL 19
  • 20. 20 SQL vs ECL SELECT diag_group_cd, COUNT(*) as volume SUM(pmt_amt) as costs FROM inpatient_claims GROUP BY diag_group_cd; TABLE( inpatient_claims, { diag_group_cd; INTEGER volume := COUNT(GROUP); REAL costs := SUM(pmt_amt); }, diag_group_cd ); SQL ECL SELECT * FROM inpatient_claims LEFT JOIN ip_value_codes RIGHT ON LEFT.id = RIGHT.id JOIN( inpatient_claims, ip_value_codes, LEFT.id = RIGHT.id );
  • 21. 21 SQL vs ECL DECLARE my_cursor CURSOR FOR SELECT * FROM inpatient_claims; OPEN my_cursor FETCH NEXT FROM my_cursor INTO @…, @… WHILE @@FETCH_STATUS = 0 BEGIN … END CLOSE my_cursor; DEALLOCATE my_cursor; ITERATE( inpatient_claims, TRANSFORM(inpatient_claim_layout, SELF.is_dropped := is_one_year_or_greater( RIGHT.admsn_dt, RIGHT.dschrgdt); SELF := RIGHT; ) ); SQL ECL
  • 22. Tx 22 ECL ROLLUP R1 R2 R3 R4 R5 R6 LEFT RIGHT RA TxLEFT RIGHT RB R4 R6R5 ROLLUP( dataset, condition(LEFT, RIGHT), transformation(LEFT, RIGHT) )
  • 23. Processing Claims 1. The intent here is to make the series of interim claims look like a single claim for most purposes, where the admission date of the first claim becomes the admission date of the whole claim and the discharge date of the last claim in the series becomes the discharge date of the whole claim. 2. 􏰂􏰂The admission date from the first series in the claim and the discharge date from the last series in the claim define the length of the stay. 3. 􏰂􏰂The MS-DRG from the last claim in the single stay (the discharge MS-DRG) determines whether the hospital stay becomes an anchor record, or whether the stay is included/excluded as a readmission for an existing episode. 4. 􏰂􏰂Costs across all IP claims included in the single stay are aggregated to the stay level. 5. Claims where the last in the series of claims has patient (…) [as “still a patient”, not discharged], flag these and drop all of the claims in the series from the IP hospital stay file. 23
  • 24. Processing Claims With ECL H_1 := SORT( A , bene_sk, provider, admsn_dt, dschrgdt, thru_dt); H_2 := ROLLUP(H_1, is_interim(LEFT, RIGHT), merge_interim_claims(LEFT, RIGHT)); H_3 := JOIN(H_2, H_1, LEFT.bene_sk = RIGHT.bene_sk […], RIGHT ONLY); H_4 := PROJECT(H_3, TRANSFORM(BPCI.Layouts.ip_claim_etl_layout, SELF.is_dropped := TRUE; SELF.dropped_reason_code := BPCI.Layouts.DROPPED_REASON_CODES.InterimClaim; SELF := LEFT; )); H := H_2 + H_4; 24
  • 25. 25 Template Language EXPORT load_all_client_files(pId, pFileSet, pBaseDataDirectory) := MACRO LOADXML(pFileSet); baseDataDirectory := pBaseDataDirectory + pId + '/'; #FOR(folder) #UNIQUENAME(subId) %subId% := %''%; #UNIQUENAME(subDS) %subDS% := Client.Datasets(%subId%); [...] #UNIQUENAME(id) %id% := pId + '::' + %''%; #UNIQUENAME(dataDir) %dataDir% := %baseDataDirectory% + %''% + '/'; #UNIQUENAME(etl) %etl% := Client.ETL(%dataDir%, %id%); %etl%.run(); #END ENDMACRO;
  • 26. 26 Template Language file_set := ’<folders>’ + '<folder>M201409</folder>' + '<folder>M201410</folder>' + '<folder>M201411</folder>' + '<folder>M201412</folder>' + '<folder>M201501</folder>' + '<folder>M201502</folder>' + '<folder>M201503</folder>' + ‘</folders>’; load_all_client_files(1234, file_set, ‘/volume1/data/‘);
  • 27. Beyond Processing Data • Security & Authentication • Collaboration • Unit Tests • Visualizations 27
  • 28. Beyond Processing: Security • HTTPS • Htpasswd • LDAP support • File level security when using LDAP 28
  • 29. Beyond Processing: Workunits • Workunit Identifier • Attribution • Query • Timings • Results 29
  • 34. 34 Beyond Processing: Unit Tests interim_claims := MODULE // Test Data test_set := BPCI.Test.Samples.ip_claim( bene_id := 1, claim_id := 1, pmt_amt := 3042.0, ...) + BPCI.Test.Samples.ip_claim( bene_id := 1, claim_id := 2, pmt_amt := 11409.0, ...) + .... ; ... EXPORT Actual := Step2.ip_stays; SHARED TestSuite := MODULE EXPORT Test01 := ASSERT(oActual(NOT is_dropped), claimno IN [1,2], 'Did not filter ou EXPORT Test02 := ASSERT(oActual(is_dropped), claimno IN [3,5]); END; EXPORT AllTests := TestSuite.Test01 + TestSuite.Test02; END;
  • 35. Beyond Processing: Unit Tests (2) 35 // Using inline dataset simple_ip_claims := DATASET([ {1,1,'0','010001',20000201,'',20000120,200 00201,'61'}, ], simplified_ip_layout); ip_claims := Samples.ip_claims(simple_ip_claims); // OR passing NAMED parameters ip_claims := Samples.ip_claims2( bene_id := 1, claim_id := 1, claim_type := '00' ) simplified_ip_layout := RECORD UNSIGNED bene_id; UNSIGNED claim_id; STRING claim_type; STRING provider_number; INTEGER4 through_date; STRING status_code; INTEGER4 admission_date; INTEGER4 discharge_date; STRING ms_drg_code; END;
  • 38. 38 (No) Insights 0 20 40 60 80 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 Episodes Costs in $1,000
  • 39. 39 Insights 0 20 40 60 80 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 Episodes Costs in $1,000 No readmit 1 Readmit 2 Readmits
  • 40. Data Delivery: Roxie • Data Delivery Engine • Indexed, compressed and in-memory • Data Warehouse Capabilities • Data Services 40
  • 41. Data Services • Web Services over Data Warehouse • XML/SOAP but also JSON • Web Services defined in ECL • Solution to add/remove data from the cluster 41
  • 42. Data Service: Query • Define service using ECL 42 INTEGER4 oOffset := 1 : STORED('Offset'); INTEGER4 oResults := 100 : STORED('Results'); INTEGER4 oStartDate := 20130201 : STORED('Begin_Date'); INTEGER4 oEndDate := 20140201 : STORED('End_Date'); ... oParams := DATASET([{ oOffset, oResults, oStartDate, oEndDate, … }], Layouts.service_parameters_layout); T(DATASET(RECORDOF(Datasets.dsFactEpisodeCostsIndex)) pData) := FUNCTION RETURN TABLE(pData, { STRING bpid := pData.bpid; UNSIGNED INTEGER1 model := pData.model; UNSIGNED INTEGER1 post_dsch_prd_length := pData.post_dsch_prd_length; INTEGER8 total_episodes := COUNT(GROUP); DECIMAL15_2 total_costs := SUM(GROUP, sum_post_dsch_prd_pay); DECIMAL15_2 average_costs := AVE(GROUP, sum_post_dsch_prd_pay); DECIMAL15_2 std_dev_costs := SQRT(VARIANCE(GROUP, sum_post_dsch_prd_pay)); }, bpid, model, post_dsch_prd_length); END; ReportServices.BaseService.run_it('Summary', oParameters, T, bpid);
  • 43. 43 Data (Web) Services { "summary": { "offset": 1, "results": 10, "begin_date": 20130101, "end_date": 20130201, … } } { "summaryResponse": { … "Results": { … "Summary": { "Row": [ { "bpid": "9999", "model": 2, "post_dsch_prd_length": 90, "total_episodes": 987, "total_costs": 9876543.21, "average_costs": 12358.13 … } ] … Request Response https://.../WsEcl/forms/json/query/roxie/summary
  • 46. Loading up data • Logical vs Physical ~abc::subfolder::subsubfolder::myfile /abc/subfolder/subsubfolder/myfile • ECL to load data into cluster: 46 oDS := DATASET( std.File.ExternalLogicalFilename('172.0.0.1','/var/lib/.../myfile.csv'), Layouts.ip_claim_layout, CSV(HEADING(0)) ); oDSDistributed := DISTRIBUTE(oDS, bene_id); OUTPUT(oDSDistributed,, ‘~somewhere::over::here::myfile’, OVERWRITE); oDS := DATASET(‘~somewhere::over::here::myfile’, Layouts.ip_claim_layout); • ECL to use data loaded into cluster:
  • 47. SuperFiles • Super File = Symbolic link, list of sub-files • Each sub-file must have the same layout 47 WEBLOGS_FILE := ‘~somewhere::logs::web’ Std.File.CreateSuperFile(WEBLOGS_FILE); … run_report() := FUNCTION oDS := DATASET(WEBLOGS_FILE, Layouts.weblogs_layout, CSV); RETURN TABLE(oDS, { ip_address; COUNT(GROUP); }, ip_address ); END; SEQUENTIAL( Std.File.StartSuperFileTransaction(), Std.File.AddSuperFile(WEBLOGS_FILE, ‘~somewhere::logs::web::2015::04::01’), Std.File.FinishSuperFileTransaction() ); • Including (more) data:
  • 48. 48 Data Services: Reusability EXPORT run_it( pServiceName, pParams, pReportFunction, pSortByField) := MACRO // Filtering data based on parameters #UNIQUENAME(DS); %DS% := WS.Datasets.dsFactEpisodeCostsIndex; […] #UNIQUENAME(B) %B% := IF(COUNT(pParams[1].providers) = 0, %A%, %A%(provider_id IN pParams[1].providers)); #UNIQUENAME(C) %C% := IF(COUNT(pParams[1].npis) = 0, %B%, %B%(at_npi IN pParams[1].npis OR op_npi IN pParams[ […] #UNIQUENAME(report) %report% := pReportFunction(%K%); #UNIQUENAME(sorted) %sorted% := SORT(%report%, pSortByField); #UNIQUENAME(O1) %O1% := OUTPUT(pParameters, NAMED('Request')); oSummary := DATASET([{ COUNT(%sorted%) }], WS.Layouts.service_summary_layout); #UNIQUENAME(O2) %O2% := OUTPUT(oSummary, NAMED(‘Metadata')); #UNIQUENAME(O3) %O3% := OUTPUT( CHOOSEN(%sorted%, pParams[1].results, pParams[1].offset), NAMED(pServiceName), ALL); PARALLEL(%O1%, %O2%, %O3%); ENDMACRO;
  • 51. 51 lpezet@archwayha.com www.linkedin.com/in/lucpezet mezzetin.blogspot.com HPCC Systems open source portal: http://hpccsystems.com Thank you Questions? Feedback? Questions ? Feedback ? www.linkedin.com/in/lucpezet