• Share
  • Email
  • Embed
  • Like
  • Private Content
BigDataCloud Sept 8 2011 meetup - Big Data Analytics for Health by Charles Kaminski of LexisNexis

BigDataCloud Sept 8 2011 meetup - Big Data Analytics for Health by Charles Kaminski of LexisNexis



Big Data Analytics for Health - Insights from the Healthcare Industry.

Big Data Analytics for Health - Insights from the Healthcare Industry.
- Charles Kaminski, LexisNexis



Total Views
Views on SlideShare
Embed Views



1 Embed 1

http://www.linkedin.com 1



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    BigDataCloud Sept 8 2011 meetup - Big Data Analytics for Health by Charles Kaminski of LexisNexis BigDataCloud Sept 8 2011 meetup - Big Data Analytics for Health by Charles Kaminski of LexisNexis Presentation Transcript

    • Big Data Cloud Meet Up September 8 th , 2011 HPCC Platform Big Data Analytics and Delivery http://hpccsystems.com LexisNexis’ massive parallel-processing open-source computing platform
    • Who’s been using the HPCC Platform and why?
        • Very large businesses
        • Federal Agencies
        • National research labs
        • It’s 4 to 10 times faster
        • Products and solutions are built much faster
        • Very complex problems can be modeled and solved
        • It’s proven
    • What’s changed? We just Open-Sourced! The HPCC Platform is now available to you. http://hpccsystems.com
    • Big Data…It’s our business. Big Data Open Source Components Insurance Financial Services Cyber Security Government Health Care Retail Telecommunications Transportation & Logistics Weblog Analysis INDUSTRY SOLUTIONS Online Reservations http://hpccsystems.com
      • Customer Data Integration
      • Data Fusion
      • Fraud Detection and Prevention
      • Know Your Customer
      • Master Data Management
      • Weblog Analysis
    • The Platform’s Major Parts
        • Thor – Data ingestion, hygiene, refining, transformation, linking, fusion
        • Roxie – Data Delivery Engine
          • Supports complex queries and distributed indexes
          • Low latency -- Latencies grow logarithmically
        • ECL – One language
          • Highly expressive and efficient declarative language
            • Solve complex problems
            • Encourage code reuse
    • How we’re different
        • It’s not a group of disparate technologies or competing visions bolted together.
        • It’s one platform with a clear proven vision.
        • This by itself is powerful.
    • How we’re different
        • You can transcend map reduce
          • Build transformative data graphs and applications using ECL
          • Solve very complex Big Data problems
          • Don’t struggle to fit your Big Data problem into groups of map reduce jobs
    • How we’re different
        • No need to munge the data before ingestion
        • No complex block file system
        • No need to tune number of tasks for different jobs
        • Data Delivery Engine is included
        • Use a single language for data cleansing, transformation, linking, fusion, and delivery
        • ECL promotes language extension and code reuse
        • Data graphs are built and optimized by the system
        • The system-generated C++ is highly optimized
        • Code execution is optimized
        • Low and predictable latencies
        • Modeling data problems as data problems leads to richer solutions
    • Challenges Facing Health Care Enterprises Challenges facing the health insurance industry
      • Disparate data in spread across separate physical locations
      • Scale of data. BIG Data is getting BIGGER.
      • Adding relationships exponentially expands the size of the BIG Data analytics challenge.
      • LexisNexis has leveraged parallel-processing computing platforms and large scale graph analytics for a over a decade.
    • Potential Fraud – a POC for the State of New York
      • Applied social network analytics to information provided by the State of New York and public data supplied by LexisNexis to identify relationships between a group of New York Medicaid recipients living in high-end condominiums located within the same complex and any links those individuals might have to medical facilities or others providing care to New York Medicaid recipients.
    • What’s entailed (high level)
      • Mix First Party data with Public and Third Data sources
      • Adds fidelity to existing entities
      • Adds new linkages into the analysis
      • Ads new entities into the analysis
      • Exposes ring leaders and brokers that don’t directly participate
      Addition of External Data http://hpccsystems.com
      • Graph Network 3 Billion derived public data relationships between people merged with risk indicators.
      • Graph Analytics examine up to 20 billion data points to create variables that allows for predictive analysis incorporating relationship context and associated risk.
      • Targets fraud across all sectors including Healthcare, Financial Services and Government.
      How we did it http://hpccsystems.com
    • Cluster Visualization Introduction
        • How many of them are living in expensive residences, owned expensive property or drive expensive cars?
        • How many recipients are contacts of medical businesses?
        • How many medical businesses are associated with any of the people in the cluster?
        • How many are currently receiving benefits?
      Medicaid Recipient Expensive Residence Owns expensive property Owns Expensive Vehicles Business Contact of Medical Business Entity Cluster visualization introduction http://hpccsystems.com
    • Cluster Visualization Cluster visualization http://hpccsystems.com
    • City Walk Sample: Vehicle Statistics What is the list of preferred expensive vehicles? Vehicle Statistics http://hpccsystems.com Make Description # Owned Make Description # Owned Mercedes-Benz 46 Chevrolet 2 Lexus 41 Hummer 2 BMW 27 Jeep 2 Infiniti 13 Nissan 2 Acura 9 Toyota 2 Lincoln 8 Aston Martin 1 Audi 7 Bentley 1 Land Rover 7 Cadillac 1 Porsche 6 GMC 1 Jaguar 5 Honda 1 Mercedes Benz 3 Volkswagen 1 Saab 3 Volvo 1
    • Dominant buyers and sellers at City Walk Property deed reference counts http://hpccsystems.com Name Deeds Held Name Deeds Held Hudson Eight 78 Mike Greem 21 Hudson Five 74 Scott Hill 21 Hudson First 73 Betty Donaway 21 Hudson Nine 65 Al Clark 19 Harry Anderson 45 Dave Miller 17 Hudson Ten 41 Mark Walker 16 Hudson Seven 39 Mike Smith 16 Home Nationwide 33 Val Edwards 15 Hudson Three 33 Eric Garcia 14 Brian Smith 28 Dane Young 14 Alan Stevens 25 Bill Moore 14 Chris Doe 24 Karen Carter 14 Sophie Davis 23 Casey Baker 14 Washington Mutual 23 Art Nelson 14 Fleet Mortgage Co. 21 Cathy Parker 13
    • The engineering story http://hpccsystems.com One guy (Joe Prichard). Three weeks. Less than part time. The platform lets him focus on the data. Joe’s a lot of fun to work with.
    • Do you do build other POC’s? Yes http://hpccsystems.com
    • What next?
      • Try us out!
      • Virtual Machine
      • Binaries
        • EC2 Data Script
        • Ensemble Recipe…Juan from Cannonical
    • Contact Information Charles Kaminski Senior Architect Academic Development Lead HPCC Systems [email_address] 402-619-9413 http://hpccsystems.com