Your SlideShare is downloading. ×
Ensuring compliance of patient data with big data and bi [bdii 301-m] - (4078)
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.


Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Ensuring compliance of patient data with big data and bi [bdii 301-m] - (4078)


Published on

Ayad Shammout and Denny Lee's PASS BA Conference session on our end-to-end Big Data to BI auditing project.

Ayad Shammout and Denny Lee's PASS BA Conference session on our end-to-end Big Data to BI auditing project.

Published in: Technology

  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide
  • Centralizing Logs Allows you to have one system process all audit logs from your servers Easier manageability Set files to 250MB in size (less files, but not too large to process)Optimized for Hadoop General Rule of Thumb: 250MB-1GB file sizes Can also centralize processing … and centralize reportingCompliance SDK contains the full projectOrganized by Server, Database, DDL, and DML actions
  • Transcript

    • 1. April 10-12, Chicago, ILEnsuring Compliance ofPatient Data with Big Dataand BIAyad Shammout & Denny Lee
    • 2. April 10-12, Chicago, ILPlease silencecell phones
    • 3. 3AgendaA Quick Big Data PrimerHealthcare and Big DataCompliance and AuditingSQL Compliance ProjectCompliance and Auditing with Big Data and BIBig Data: Unstructured Volumes of DataAnalytics: PowerPivot, Power View
    • 4. 4What is Big Data?VolumeExceeds physical limits of vertical scalabilityVelocityDecision window small compared to datachange rateVarietyMany different formats makes integrationexpensiveVariabilityMany options or variable interpretationsconfound analysis
    • 5. 510xincrease everyfive years85%fromnew data typesDataexplosionVolumeVelocityVarietyHadoopCloudBy 2015, organizations thatbuild a modern informationmanagement system willoutperform their peersfinancially by 20 percent. – Gartner, Mark Beyer“Information Management in the21st Century”
    • 6. 7Big Data Business Value140,000-190,0001.5 million$300 billion15 out of 17€250 billion 50-60%
    • 7. 8Data
    • 8. 9Hadoop: The most visible face of Big Data
    • 9. 10HDInsight: Visit HadoopOnAzure.com10
    • 10. Healthcareand Big Data
    • 11. 12Healthcare and ITOften the laggard in technologyYet application of IT to healthcare can radically change what we can doGenomic SequencingProteomic sequencingIncidence Prediction
    • 12. 13Healthcare Big Data Example ScenariosClinical Trial DeviationsOriginally Viagra was developed to lower blood pressure and treat AnginaNow its used to help newborn pulmonary hypertension and altitude sicknessIncidence PredictionMissed 4 or more visits, twice as likely to have an asthmatic incidentParticular Cardiac monitor sine wave points to highly likelihood of heart attackCampaignsSocial media and advertising campaigns to understand user behavior and sentimentPatient SatisfactionSocial media and advertising campaigns to understand user behavior and sentiment
    • 13. 14BIDMC Auditing ScenarioAuditing is critical component HIPAA in ensuring patient privacy1 Billion rows+ of audit data146 mission critical clinical applicationsComprehensive audits yield 300-500k transactions/dayHIPAA requires audit system with 20 years of dataAuditing ProjectAvailable to community as part of Compliance SDKUpdating for SQL Server 2012, HDInsight, Power View, and MobileBI*Creating an enterprise tool for consolidated storage, reporting and alerting of all application auditdata - thats cool!John Halamka’s Cool Technology of the Week(Wellsphere Top Health Blogger, Health Impact Award)
    • 14. 15BIDMC Compliance ProjectSSISSSISSSISHDInsightWindowsHDInsightAzureSQLServer2008/2012Audit LogsETL Logs toHDFSUse Excel 2013PowerPivot and PowerViewSSAS (tabular)
    • 15. 16Auditing Sensitive Information16Querying Audit InformationUse PowerPivot / Power View / Analysis Services to Query the data.Security InformationPolicy InformationProcess Audit InformationUse SSIS to process SQL2008 All-Actions Audit Information and other CG applicationaudit log data; potentially can use Management Performance DW framework.Caregroup EnvironmentFile ServerSQL AuditConnect/LogicSSISCG Application DataIntersystemsCacheSQL2005OracleSQL2008 All-Actions Audit DataSQL 2008 / 2012 R2SSRS 2008 /Power ViewPolicy AnalysisPolicy ReportsPolicy BestPracticesSecurity AnalysisSecurity ReportsComplianceReportsFeedback Action LoopUpdate systems to keep themcompliant and secure
    • 16. Audit Logs17Storage InfrastructureTransfer files to ASV via AzCopy,CloudExplorer, etc.
    • 17. 18Storage Infrastructure18Hadoop on AzureCompute Nodes (Medium VMs)Azure Storage Vault (ASV)Azure Blob StorageAzure Flat Network Storage
    • 18. 19Storage Infrastructure19Hadoop on AzureCompute Nodes (Medium VMs)Azure Storage Vault (ASV)Azure Blob StorageAzure Flat Network StorageStream dataTo computePush dataBack to Storagemap sort shuffle reduce
    • 19. 2020SSIS to HDInsight
    • 20. 2121SSISProcessing
    • 21. 22SSASTabularof HoAAuditData
    • 22. 23Hadoop / Auditing: File sizesCurrently testing gz vs. rawE.g. 12MB raw text file vs. 633Kb gz file (~20x compression)20x smaller size, ~same query timeApprox same map / reduce task utilizationFile Size is 250MB-1GBSSIS package takes care of the sizeFuture testing: avro, protobuf23Query Duration (s)select count(*) from sql_audit_asv_raw 56.066select count(*) from sql_audit_asv_gz 58.994
    • 23. 24Hadoop / Auditing: FormatsFor ease of processing, replace carriage returns within embedded SQLstatements, col1, col2from tableAtoselect col1, col2 from tableAThis allows you to create a Hive table using CR as row delimiter (i.e.does not have things like SQL quoted identifiers)24
    • 24. 25
    • 25. SQOOP, HiveODBC,Templeton, CSV, etcBI Connectivity
    • 26. 27Big Data … Excel-lerated!2 Server, 3mo110 GBbinaryfilesSSISSSISSSISSSIS extraction1.2GB of text120MB gzHadoop toPowerPivot6MB
    • 27. 28PowerPivot workbook of HoA Audit data
    • 28. 29Power View of HoA Audit Data
    • 29. 30Win a Microsoft Surface Pro!Complete an online SESSION EVALUATIONto be entered into the draw.Draw closes April 12, 11:59pm CTWinners will be announced on the PASS BAConference website and on Twitter.Go to or follow the QR code link displayed onsession signage throughout the conference venue.Your feedback is important and valuable. All feedback will be used to improveand select sessions for future events.
    • 30. April 10-12, Chicago, ILThank you!Diamond Sponsor Platinum Sponsor