April 10-12, Chicago, ILEnsuring Compliance ofPatient Data with Big Dataand BIAyad Shammout & Denny Lee
April 10-12, Chicago, ILPlease silencecell phones
3AgendaA Quick Big Data PrimerHealthcare and Big DataCompliance and AuditingSQL Compliance ProjectCompliance and Auditing ...
4What is Big Data?VolumeExceeds physical limits of vertical scalabilityVelocityDecision window small compared to datachang...
510xincrease everyfive years85%fromnew data typesDataexplosionVolumeVelocityVarietyHadoopCloudBy 2015, organizations thatb...
7Big Data Business Value140,000-190,0001.5 million$300 billion15 out of 17€250 billion 50-60%
8Data
9Hadoop: The most visible face of Big Data
10HDInsight: Visit HadoopOnAzure.com10
Healthcareand Big Data
12Healthcare and ITOften the laggard in technologyYet application of IT to healthcare can radically change what we can doG...
13Healthcare Big Data Example ScenariosClinical Trial DeviationsOriginally Viagra was developed to lower blood pressure an...
14BIDMC Auditing ScenarioAuditing is critical component HIPAA in ensuring patient privacy1 Billion rows+ of audit data146 ...
15BIDMC Compliance ProjectSSISSSISSSISHDInsightWindowsHDInsightAzureSQLServer2008/2012Audit LogsETL Logs toHDFSUse Excel 2...
16Auditing Sensitive Information16Querying Audit InformationUse PowerPivot / Power View / Analysis Services to Query the d...
Audit Logs17Storage InfrastructureTransfer files to ASV via AzCopy,CloudExplorer, etc.
18Storage Infrastructure18Hadoop on AzureCompute Nodes (Medium VMs)Azure Storage Vault (ASV)Azure Blob StorageAzure Flat N...
19Storage Infrastructure19Hadoop on AzureCompute Nodes (Medium VMs)Azure Storage Vault (ASV)Azure Blob StorageAzure Flat N...
2020SSIS to HDInsight
2121SSISProcessing
22SSASTabularof HoAAuditData
23Hadoop / Auditing: File sizesCurrently testing gz vs. rawE.g. 12MB raw text file vs. 633Kb gz file (~20x compression)20x...
24Hadoop / Auditing: FormatsFor ease of processing, replace carriage returns within embedded SQLstatements, e.g.select col...
25
SQOOP, HiveODBC,Templeton, CSV, etcBI Connectivity
27Big Data … Excel-lerated!2 Server, 3mo110 GBbinaryfilesSSISSSISSSISSSIS extraction1.2GB of text120MB gzHadoop toPowerPiv...
28PowerPivot workbook of HoA Audit data
29Power View of HoA Audit Data
30Win a Microsoft Surface Pro!Complete an online SESSION EVALUATIONto be entered into the draw.Draw closes April 12, 11:59...
April 10-12, Chicago, ILThank you!Diamond Sponsor Platinum Sponsor
Ensuring compliance of patient data with big data and bi [bdii 301-m] - (4078)
Upcoming SlideShare
Loading in …5
×

Ensuring compliance of patient data with big data and bi [bdii 301-m] - (4078)

2,349 views
2,266 views

Published on

Ayad Shammout and Denny Lee's PASS BA Conference session on our end-to-end Big Data to BI auditing project.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
2,349
On SlideShare
0
From Embeds
0
Number of Embeds
1,495
Actions
Shares
0
Downloads
21
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Centralizing Logs Allows you to have one system process all audit logs from your servers Easier manageability Set files to 250MB in size (less files, but not too large to process)Optimized for Hadoop General Rule of Thumb: 250MB-1GB file sizes Can also centralize processing … and centralize reportingCompliance SDK contains the full projectOrganized by Server, Database, DDL, and DML actions
  • Ensuring compliance of patient data with big data and bi [bdii 301-m] - (4078)

    1. 1. April 10-12, Chicago, ILEnsuring Compliance ofPatient Data with Big Dataand BIAyad Shammout & Denny Lee
    2. 2. April 10-12, Chicago, ILPlease silencecell phones
    3. 3. 3AgendaA Quick Big Data PrimerHealthcare and Big DataCompliance and AuditingSQL Compliance ProjectCompliance and Auditing with Big Data and BIBig Data: Unstructured Volumes of DataAnalytics: PowerPivot, Power View
    4. 4. 4What is Big Data?VolumeExceeds physical limits of vertical scalabilityVelocityDecision window small compared to datachange rateVarietyMany different formats makes integrationexpensiveVariabilityMany options or variable interpretationsconfound analysis
    5. 5. 510xincrease everyfive years85%fromnew data typesDataexplosionVolumeVelocityVarietyHadoopCloudBy 2015, organizations thatbuild a modern informationmanagement system willoutperform their peersfinancially by 20 percent. – Gartner, Mark Beyer“Information Management in the21st Century”
    6. 6. 7Big Data Business Value140,000-190,0001.5 million$300 billion15 out of 17€250 billion 50-60%
    7. 7. 8Data
    8. 8. 9Hadoop: The most visible face of Big Data
    9. 9. 10HDInsight: Visit HadoopOnAzure.com10
    10. 10. Healthcareand Big Data
    11. 11. 12Healthcare and ITOften the laggard in technologyYet application of IT to healthcare can radically change what we can doGenomic SequencingProteomic sequencingIncidence Prediction
    12. 12. 13Healthcare Big Data Example ScenariosClinical Trial DeviationsOriginally Viagra was developed to lower blood pressure and treat AnginaNow its used to help newborn pulmonary hypertension and altitude sicknessIncidence PredictionMissed 4 or more visits, twice as likely to have an asthmatic incidentParticular Cardiac monitor sine wave points to highly likelihood of heart attackCampaignsSocial media and advertising campaigns to understand user behavior and sentimentPatient SatisfactionSocial media and advertising campaigns to understand user behavior and sentiment
    13. 13. 14BIDMC Auditing ScenarioAuditing is critical component HIPAA in ensuring patient privacy1 Billion rows+ of audit data146 mission critical clinical applicationsComprehensive audits yield 300-500k transactions/dayHIPAA requires audit system with 20 years of dataAuditing ProjectAvailable to community as part of Compliance SDKUpdating for SQL Server 2012, HDInsight, Power View, and MobileBI*Creating an enterprise tool for consolidated storage, reporting and alerting of all application auditdata - thats cool!John Halamka’s Cool Technology of the Week(Wellsphere Top Health Blogger, Health Impact Award)
    14. 14. 15BIDMC Compliance ProjectSSISSSISSSISHDInsightWindowsHDInsightAzureSQLServer2008/2012Audit LogsETL Logs toHDFSUse Excel 2013PowerPivot and PowerViewSSAS (tabular)
    15. 15. 16Auditing Sensitive Information16Querying Audit InformationUse PowerPivot / Power View / Analysis Services to Query the data.Security InformationPolicy InformationProcess Audit InformationUse SSIS to process SQL2008 All-Actions Audit Information and other CG applicationaudit log data; potentially can use Management Performance DW framework.Caregroup EnvironmentFile ServerSQL AuditConnect/LogicSSISCG Application DataIntersystemsCacheSQL2005OracleSQL2008 All-Actions Audit DataSQL 2008 / 2012 R2SSRS 2008 /Power ViewPolicy AnalysisPolicy ReportsPolicy BestPracticesSecurity AnalysisSecurity ReportsComplianceReportsFeedback Action LoopUpdate systems to keep themcompliant and secure
    16. 16. Audit Logs17Storage InfrastructureTransfer files to ASV via AzCopy,CloudExplorer, etc.
    17. 17. 18Storage Infrastructure18Hadoop on AzureCompute Nodes (Medium VMs)Azure Storage Vault (ASV)Azure Blob StorageAzure Flat Network Storage
    18. 18. 19Storage Infrastructure19Hadoop on AzureCompute Nodes (Medium VMs)Azure Storage Vault (ASV)Azure Blob StorageAzure Flat Network StorageStream dataTo computePush dataBack to Storagemap sort shuffle reducehttp://dennyglee.com/2013/03/18/why-use-blob-storage-with-hdinsight-on-azure/
    19. 19. 2020SSIS to HDInsight
    20. 20. 2121SSISProcessing
    21. 21. 22SSASTabularof HoAAuditData
    22. 22. 23Hadoop / Auditing: File sizesCurrently testing gz vs. rawE.g. 12MB raw text file vs. 633Kb gz file (~20x compression)20x smaller size, ~same query timeApprox same map / reduce task utilizationFile Size is 250MB-1GBSSIS package takes care of the sizeFuture testing: avro, protobuf23Query Duration (s)select count(*) from sql_audit_asv_raw 56.066select count(*) from sql_audit_asv_gz 58.994
    23. 23. 24Hadoop / Auditing: FormatsFor ease of processing, replace carriage returns within embedded SQLstatements, e.g.select col1, col2from tableAtoselect col1, col2 from tableAThis allows you to create a Hive table using CR as row delimiter (i.e.does not have things like SQL quoted identifiers)24
    24. 24. 25
    25. 25. SQOOP, HiveODBC,Templeton, CSV, etcBI Connectivity
    26. 26. 27Big Data … Excel-lerated!2 Server, 3mo110 GBbinaryfilesSSISSSISSSISSSIS extraction1.2GB of text120MB gzHadoop toPowerPivot6MB
    27. 27. 28PowerPivot workbook of HoA Audit data
    28. 28. 29Power View of HoA Audit Data
    29. 29. 30Win a Microsoft Surface Pro!Complete an online SESSION EVALUATIONto be entered into the draw.Draw closes April 12, 11:59pm CTWinners will be announced on the PASS BAConference website and on Twitter.Go to passbaconference.com/evals or follow the QR code link displayed onsession signage throughout the conference venue.Your feedback is important and valuable. All feedback will be used to improveand select sessions for future events.
    30. 30. April 10-12, Chicago, ILThank you!Diamond Sponsor Platinum Sponsor

    ×