Converting Data to Information29/04/2011         DataCyte (Pty) Ltd   1
DataCyte Group of Companies •     Founded in 1998 •     Previously known as World Wide Objects •     Privately owned and f...
DataCyte Timeline     1998 - Product was conceptualized, developed first version by late 1999.     2000 - Lodge Patent A...
DataCyte Timeline cont.     2008 - Cedars Sinai Cancer and Proteomic Research Unit (UCLA) benchmark           - DXS Healt...
29/04/2011   e-Merchandising (Pty) Ltd t/a Revelation Systems   5
DataCyte Timeline cont.     2010 April - Booz Allen Hamilton (www.boozallen.com) presents DataCyte as future data solutio...
DataCyte Timeline cont.    2010 June - Negotiation started with Bytes Technology and its Med-e-mass (www.medemass.com) su...
Why DataCyte?29/04/2011    e-Merchandising (Pty) Ltd t/a Revelation Systems   8
Computing Challenge •     The “Global Village” has “Global Data”       •     Boundaries removed       •     Information fl...
DataCyte Patented Solution•     Performance not dependent on number of records•     No single point of vulnerability      ...
PerformanceFeatures•     Access by association•     Fully distributed storage system•     DataCyte data storage is 10% of ...
DataCyte TechnologyCyte       • Parent Registry       • Child Registry       • BLOB content           • Any data form     ...
Access Models•    Multiple Logical Models: Data and application layer         Network        Structured              Conta...
Case Studies29/04/2011     DataCyte (Pty) Ltd   14
Case Studies                                                  DataCyte Pilot/Test Process    Not part of test –     steady...
Case Studies•     BAH: Implementation of BI Reporting                                       Oracle                        ...
Other Case Studies•     Proteomic Research Unit      •      Database Size                 1,3Tb in Oracle                 ...
Applications Developed •      Knowledge Management Systems        o    e-Learning Systems        o    Interactive TV Manag...
Applications Developed•      Data Warehousing       o     ETL       o     “Data Cube”       o     Lawgistics       o     F...
DataCyte Benefits •     90% reduction in hardware requirements •     10 to 1000 time speed improvement •     Ability to po...
Contact Details•     DataCyte (Pty) Ltd      • 489 Clarence Street        Tel:    +27 12 993 1256          Waterkloof Glen...
Customers      •      Booz Allen Hamilton Inc      •      South African Fraud Prevention Service      •      TrashCanKidz ...
Back-up Slides29/04/2011           DataCyte (Pty) Ltd   23
Technology OverviewDatabase Management System• Access   • Object   • SQL   • CyteEtymology: “Cyte”• Ancient Greek word κύτ...
Technology Overview• Store: any form of data → „Cytes‟   • Serialized and persisted on creation (more later)   • Accessed ...
Technology OverviewExecution Layers• Application Layer• Data / Engine Layer
Technology OverviewBasic Performance(1.6Ghz Dual Core, 3Gb RAM, 7 200 rpm drive)Sustained creation speed•   400 000 cytes ...
Data StructureCyte  • Parent Registry  • Child Registry  • BLOB content      • Any data form       • Code           • Lua ...
Data Structure
Data StructureDemonstration: IDE
Data StructureLogical representations: data & application layer• Network representation• Structured representation• Contai...
Data StructureComplexity vs Simplicity• Simpler → faster learning curve• Translation layer     • RDBMS         • Programme...
Data StructureImpedance of Mismatch (Translation Layer)• Maintenance and Development (RDBMS)   • Different mapping → misma...
Discovery   Logical Model                           1       2   3       4   External                           Conceptual ...
DiscoveryEach Cyte is addressed as:       IP address + Disc + File + Position in file
DiscoveryCytes contain application logic   • Variables → pointers       • Access stored data       • Execute application c...
DiscoveryContextual Execution Demonstration: IDE                       …
Discovery• Cyte Discovery   • Relative Paths       • in reference to executing Cyte             • getAge() example   • Abs...
Query Approach
Query Approach• Types of Queries   • Without indexes        • each record is checked in turn   • Indexed        • filtered...
Query Approach• At time of query   RESULT SET IDENTIFICATION      • Improved indexes   RESULT SET POPULATION (Compound)   ...
Query ApproachDefine: Contextual / Stateful Results
Query ApproachDefine: Contextual / Stateful Results
Query ApproachAddressing schema   • Defines context of access   • Cyte → Unique ID within local file system       • Offset...
Query Approach• Query Language   • Show of hands: SQL users   • Similar to XPath       • Parent or child Cytes (multiple c...
Performance & Scalability• Sub-linear Performance Degradation    • Logical Layer → Directed Searches    • Example: Geo-spa...
Performance & Scalability• OLTP   • Architecture marries Structured + Networking paradigms   • Container Topology        •...
Performance & Scalability•   Proof of Concept: 2008•   Cancer research hospital (Los Angeles)•   Considerable funding•   A...
Performance & Scalability• No Intermediate link tables                    Intermediate   Products             Table      I...
Query ApproachDefine: Contextual / Stateful Results
Query ApproachDefine: Contextual / Stateful Results
Data Storageo Serializedo Encoded                      Open                   Encode                  Decodeo Compressed  ...
Data Storageo   Leaf nodeso   Stack managemento   Partial Decodingo   Data Management
PerformanceFour Spheres of Influence during Design
Data Storage•   Enterprise Cloud Storage•   Soft RAIS using commodity hardware•   RAIS provides soft parallelized, grid co...
Data Storage• External Data Sources   • Lua add-on libraries: LuaCOM and ADOLua• Access: Data Services   • MSSQL          ...
Security• Security implemented by the service on the cyte level• Domain-based, inclusion, exclusion• Cyte-to-cyte communic...
Disaster Recovery• Transaction-based with roll-back• All transactions are Atomic, Consistent, Isolated and  Durable (ACID)...
Upcoming SlideShare
Loading in …5
×

DataCyte - The Future of Data Storage & Retrieval

1,367 views

Published on

The vision for the creation of DataCyte was to create a data storage and retrieval structure which would enable the development of applications in an organic manner and where the performance of the applications would be largely independent of the amount of data and the relationships built between the data elements.

Published in: Technology
1 Comment
2 Likes
Statistics
Notes
  • hello every one...we worked with this company and they don't give our payment from 1.5 years so guys please don't work with company...be aware...they are conman...so please don't work with this company...guys
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
1,367
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
0
Comments
1
Likes
2
Embeds 0
No embeds

No notes for slide

DataCyte - The Future of Data Storage & Retrieval

  1. 1. Converting Data to Information29/04/2011 DataCyte (Pty) Ltd 1
  2. 2. DataCyte Group of Companies • Founded in 1998 • Previously known as World Wide Objects • Privately owned and funded • Development done in Pretoria, South Africa • Expanding to create distribution and partner network • Building relationships with ISVs29/04/2011 2
  3. 3. DataCyte Timeline  1998 - Product was conceptualized, developed first version by late 1999.  2000 - Lodge Patent Application  2001 - Rated 5-10 years before IBM grid computing initiative by DARPA/CSC/Lockheed Martin - Awarded United States of America Department of Defense contract  2002 - Defense contract suspended due to war on terror  2003 - Return to South Africa due to declaration of war against “terror” - Start delivering healthcare systems to South African market  2005 - Return to the US market with Healthcare and hi-tech value proposition  2006 - Benchmark data analysis capabilities with Zirmed, prove a 50% in size reduction and 10x faster - Entered into business relationship with Dr PatrickSoon-Shiong of Abraxis Biosciences and American Pharmaceutical Partners Inc.  2008 - A conflict of product direction emerged with Dr Patrick Soon-Shiong - resulted in termination of the relationship. All Intellectual Property rights reverted back to DataCyte.29/04/2011 3
  4. 4. DataCyte Timeline cont.  2008 - Cedars Sinai Cancer and Proteomic Research Unit (UCLA) benchmark - DXS Health Care Systems Technology Partnership (www.dxs-systems.com) - Trash Can Kids Technology Partnership (www.trashcankidz.com ) - Electronic Price Labeling Technology Partnership - Interactive Television/Phumelela Technology Partnership (www.phumelela.com)  2009 - Establish strategic partnership Health One Global (www.healthoneglobal.com ) - IR Global Partnership to deliver international roaming at dramatic discounted rates and enabling prepaid customer to also roam. - Barlow World Logistics Product Development - Re-engage with the United Sates Department of Defense through US presence - Granted US Patent #757144229/04/2011 4
  5. 5. 29/04/2011 e-Merchandising (Pty) Ltd t/a Revelation Systems 5
  6. 6. DataCyte Timeline cont.  2010 April - Booz Allen Hamilton (www.boozallen.com) presents DataCyte as future data solution at American Association for the Advancement of Science. AAAS (www.aaas.org) is the largest paid circulation of any peer-reviewed general science journal in the world, founded in 1848, and is considered one of the global authorities in the direction of Science, Engineering and Innovation May - Launch Interactive Television, 400 units rolled out in TABS. Prime Media is currently finalizing purchase of advertising slots for 12 month period. June - Launch of DXS (dxssynergy.com) web based system to the USA market as part of its global rollout. A global vendor in the provisioning of healthcare related systems. June – Launch of Trash Can Kids (www.trashcankidz.com) June – Launch of Process Discovery Product with 2 customers going live this month. The system has already being adopted by a large defense manufacturer. June - E-Discovery product launched with EM (The largest non-life actuarial consultant firm in UK)29/04/2011 6
  7. 7. DataCyte Timeline cont.  2010 June - Negotiation started with Bytes Technology and its Med-e-mass (www.medemass.com) subsidiary to underpin their current suite of management system with a comprehensive EHR Solution for the South African market. July - Health One Global (www.healthoneglobal.com.au) launches Personal Electronic Health Record and Medical Management record in Australia. This launch coincides with the launch of the Australian Government Unique personal health identifier, with the support of the Australian Automobile Association and the Royal Academy of Physicians as a first step to provide the Australian a health record management service. The Australian government has legislated that all citizens must have these records in place by 2013.29/04/2011 7
  8. 8. Why DataCyte?29/04/2011 e-Merchandising (Pty) Ltd t/a Revelation Systems 8
  9. 9. Computing Challenge • The “Global Village” has “Global Data” • Boundaries removed • Information flow is more pervasive • Physical Storage • Users store more data than ever before • Little new development in Data Retrieval Systems • Processing • More processing required to retrieve similar data • Little development in Computing Processing Systems • Present Business Tendencies • Swing back to centralized systems • Swing back to thin client29/04/2011 DataCyte (Pty) Ltd 9
  10. 10. DataCyte Patented Solution• Performance not dependent on number of records• No single point of vulnerability • No central registry • Information redundantly distributed • RAIS – Redundant Array of Inexpensive Servers• Dynamic, Intelligent Information • Contextual „named‟ links between data entities • Dynamic data structure • Pervasive Associations• Self-managing, Distributed Information Structures • Any Entity must have „independence of existence‟ • Entities „self-aware‟ of environment• Web-enabled with open interface - Apache29/04/2011 DataCyte (Pty) Ltd 10
  11. 11. PerformanceFeatures• Access by association• Fully distributed storage system• DataCyte data storage is 10% of the size of traditional systems• Sustainable data creation at 400 000 cytes per second on a standard desktop computer• Random data access speed of over 250 000 cytes per second on a standard desktop computer• Caches up to 25 000 000 cytes in 2Gb memory• Can access from 250 000 000 000 cytes in sub- millisecond• Runs on Linux and Windows29/04/2011 DataCyte (Pty) Ltd 11
  12. 12. DataCyte TechnologyCyte • Parent Registry • Child Registry • BLOB content • Any data form • Code • Lua • Others possible • Flags • Security/Access control • Content type, etc • Native methods • Provided by service29/04/2011 DataCyte (Pty) Ltd 12
  13. 13. Access Models• Multiple Logical Models: Data and application layer Network Structured Containment29/04/2011 DataCyte (Pty) Ltd 13
  14. 14. Case Studies29/04/2011 DataCyte (Pty) Ltd 14
  15. 15. Case Studies DataCyte Pilot/Test Process Not part of test – steady source Current State Process DataStage ETL MS SSIS MDX BAH DW Process Timing: PME Data MS SSRS MS SSAS Query Oracle 4.5 hrs daily Mart Reports Oracle – 6.5 hrs for closings 750GIG (2 times month) DataCyte Test DataCyte Extraction & Web BAH DW Translation Crystal DataCyte Service Oracle Reports Fact Maps29/04/2011 DataCyte (Pty) Ltd 15
  16. 16. Case Studies• BAH: Implementation of BI Reporting Oracle DataCyte • Database size 750Gb 51Gb (6,8%) • Retrieval speeds: Indexed Random Access 0,152 secs 0,008 secs (5,2%) Indexed Step Thru 0,963 secs 0,016 secs (1,66%) Unindexed Step Thru 2.515 secs 10.78 secs (425%) • Hardware Platform: ±US$2 000 000 US$ 1 000 SunTM Grid Rack 400, SAN Low-end DesktopVM 1 x Staging areas, 1 x Cube storage 2,33GH processor 200GB HDD • Software: ±US$3 000 000 Software: ± US$ 500 000 ELT Toolkit, DataCyte 2 x Oracle 11g MS HyperCube29/04/2011 DataCyte (Pty) Ltd 16
  17. 17. Other Case Studies• Proteomic Research Unit • Database Size 1,3Tb in Oracle 60Gb in DataCyte • Retrieval speeds: 1½ minutes < 1sec in DataCyte 1 - 2 days < 11-66 mins in DataCyte • Hardware Platform: SunTM Grid Rack of 400 Toshiba Laptop Sun FireTM x64 servers 1,86GH processor 7 200rpm drive• UCS SAP database • 860Gb in DB2 database 100Gb in DataCyte • Queries up to 1000 times faster29/04/2011 DataCyte (Pty) Ltd 17
  18. 18. Applications Developed • Knowledge Management Systems o e-Learning Systems o Interactive TV Management Systems o Medical Information Systems • Health Management – “Single Patient Record” • Practice Management • Clinic Management System • Pathology Laboratory Management • Clinical Trials System • Hospital Management System29/04/2011 DataCyte (Pty) Ltd 18
  19. 19. Applications Developed• Data Warehousing o ETL o “Data Cube” o Lawgistics o Fraud Detection• SME Payroll System• Process Management Server o Document Tracking Systems o Business Process Modeling o Supply Chain Management System• Computational Performance Systems o Biometrics o Proteomic and Genomic Analysis o Shortest Path Routing29/04/2011 DataCyte (Pty) Ltd 19
  20. 20. DataCyte Benefits • 90% reduction in hardware requirements • 10 to 1000 time speed improvement • Ability to populate archive/warehouse in real-time • Ability to access archived data faster than existing on- line live system • Extension of life of live systems • Greater security due to ALL history on-line29/04/2011 DataCyte (Pty) Ltd 20
  21. 21. Contact Details• DataCyte (Pty) Ltd • 489 Clarence Street Tel: +27 12 993 1256 Waterkloof Glen Fax: +27 12 993 2412 Pretoria• Michael F Salomon CEO • Cell: +27 82 552 5411• Peter Salemink COO • Cell: +27 83 677 2783• Daniel Opland Technology Evangelist • Cell: +27 83 312 594729/04/2011 DataCyte (Pty) Ltd 21
  22. 22. Customers • Booz Allen Hamilton Inc • South African Fraud Prevention Service • TrashCanKidz Limited • Broadband Interactive TV System • PayStaffOnline (Pty) Ltd • 360 Link-up Limited / EMC Limited29/04/2011 DataCyte (Pty) Ltd 22
  23. 23. Back-up Slides29/04/2011 DataCyte (Pty) Ltd 23
  24. 24. Technology OverviewDatabase Management System• Access • Object • SQL • CyteEtymology: “Cyte”• Ancient Greek word κύτος (kýtos) • Container or Receptacle • Human body → part of cell that keeps everything togetherDeveloped in C++• Runs on Windows and LinuxODBC, XML and Web Service accessApache module: mod_dsa• HTTP(S), FTP, WSDL, SOAP, …
  25. 25. Technology Overview• Store: any form of data → „Cytes‟ • Serialized and persisted on creation (more later) • Accessed by association in a contextual / stateful manner • Collectively form multiple intersecting hierarchies • Each Cyte has the potential to form part of a distributed cloud • Virtualize disparate data → single federated view • Contain application business logic • Lua (www.lua.org)• Lua • Powerful, fast, lightweight scripting language • Embedable • Lua is widely used: • Industrial Applications (Adobe: Photoshop Lightroom) • Games (Blizzard: World of Warcraft) • Embedded Systems (Ginga, Digital TV in Brazil) • Lua Server Pages • Tag-based Web applications that dynamically generate Web pages
  26. 26. Technology OverviewExecution Layers• Application Layer• Data / Engine Layer
  27. 27. Technology OverviewBasic Performance(1.6Ghz Dual Core, 3Gb RAM, 7 200 rpm drive)Sustained creation speed• 400 000 cytes per secondSequential access speed• 400 000 cytes per secondRandom access speed• 250 000 cytes per secondCache• 25 000 000 cytes in 2Gb memoryAccess• Any element from 250 billion elements in under a millisecond
  28. 28. Data StructureCyte • Parent Registry • Child Registry • BLOB content • Any data form • Code • Lua • Others possible • Flags • Security/Access control • Content type, etc • Native methods • Provided by service
  29. 29. Data Structure
  30. 30. Data StructureDemonstration: IDE
  31. 31. Data StructureLogical representations: data & application layer• Network representation• Structured representation• Containment representation Network Structured Containment
  32. 32. Data StructureComplexity vs Simplicity• Simpler → faster learning curve• Translation layer • RDBMS • Programmed • Maintained • Adding features, fixing bugs, improvements • Collectively comprise 80% of lifetime cost • DataCyte • No translation layer • Saving: Development (Time and Cost) • Integrated into database layer (a la EJB) • e.g. Cytes with application logic
  33. 33. Data StructureImpedance of Mismatch (Translation Layer)• Maintenance and Development (RDBMS) • Different mapping → mismatch and integrity violation • Subtle Issues • Difficult to locate (time + money) • Lower impedance of mismatch in DataCyte • No translation layer → natural modelling of dataArchitecture: Simple• Option: logically structure and constrain → RDBMS + ODBC• Multiple logical views of the same data • Facilitates conformance to multiple standards
  34. 34. Discovery Logical Model 1 2 3 4 External Conceptual Model Conceptual Physical Model 1 2 PhysicalCytes→ Logical representation of physical storage→ Navigational construct each navigation → physical disk read→ Brokered by DataCyte service
  35. 35. DiscoveryEach Cyte is addressed as: IP address + Disc + File + Position in file
  36. 36. DiscoveryCytes contain application logic • Variables → pointers • Access stored data • Execute application code Variable =
  37. 37. DiscoveryContextual Execution Demonstration: IDE …
  38. 38. Discovery• Cyte Discovery • Relative Paths • in reference to executing Cyte • getAge() example • Absolute paths • in reference to root Cyte ?
  39. 39. Query Approach
  40. 40. Query Approach• Types of Queries • Without indexes • each record is checked in turn • Indexed • filtered records• Query approach (same as RDBMS) Know what you are looking for AND Where you want to look for it• Query Steps: STEP 1: IDENTIFY STEP 2: POPULATE RESULT SET RESULT SET
  41. 41. Query Approach• At time of query RESULT SET IDENTIFICATION • Improved indexes RESULT SET POPULATION (Compound) • Traverse logical layer (minimal reads) • Context = Stateful Results vs • Additional external lookups Additional External Navigate through Logical Structure Lookups O(n) → O(1)
  42. 42. Query ApproachDefine: Contextual / Stateful Results
  43. 43. Query ApproachDefine: Contextual / Stateful Results
  44. 44. Query ApproachAddressing schema • Defines context of access • Cyte → Unique ID within local file system • Offset within file • Cytes simply exist within file system • No global registryMultiple Contexts = Multiple Addresses Address = Context = Chain of ID‟s (named)
  45. 45. Query Approach• Query Language • Show of hands: SQL users • Similar to XPath • Parent or child Cytes (multiple criteria) • SQL Interface • Cytes that conform to relational model• Lower complexity of architecture • More natural language • Steeper learning curve (learn more faster)
  46. 46. Performance & Scalability• Sub-linear Performance Degradation • Logical Layer → Directed Searches • Example: Geo-spatial modelling• Instantiation • Full control over level • No class hierarchy• Multiple Logical Structures • Same data, different context • Multi-dimensional searches → single dimension
  47. 47. Performance & Scalability• OLTP • Architecture marries Structured + Networking paradigms • Container Topology • Allows extensible heterogeneous data structuring • 3-stage versioning protocol • Balance: performance and integrity• Data Footprint • Encoding and Compressing on storage• No Intermediate link tables Intermediate Products Table Ingredients
  48. 48. Performance & Scalability• Proof of Concept: 2008• Cancer research hospital (Los Angeles)• Considerable funding• A proteomic analysis problem – blood analysis study • Data mining to search for cancer markers • 50 data samples • 250 billion data elements • 1.3 Tb in Oracle• Results are from the same data set and same queries Cancer Center DataCyte Single criteria queries 1½ minutes < 1 sec Complex Queries ± 1 – 2 days < 11-66 mins Hardware Sun™ Grid Rack 400 Laptop Sun Fire™ x64 servers Data footprint 1.3Tb 60Gb
  49. 49. Performance & Scalability• No Intermediate link tables Intermediate Products Table Ingredients
  50. 50. Query ApproachDefine: Contextual / Stateful Results
  51. 51. Query ApproachDefine: Contextual / Stateful Results
  52. 52. Data Storageo Serializedo Encoded Open Encode Decodeo Compressed Encodedo Pages Decompress Compresso Caching Stack Encoded & Compressedo Data Distance
  53. 53. Data Storageo Leaf nodeso Stack managemento Partial Decodingo Data Management
  54. 54. PerformanceFour Spheres of Influence during Design
  55. 55. Data Storage• Enterprise Cloud Storage• Soft RAIS using commodity hardware• RAIS provides soft parallelized, grid computing• Soft RAIS enables redundant distribution of cytes• Granular scalability and full sharability of resources• Elastic auto provision of service and resources• Unified access to data through multiple data models• New programming approaches unconfined by • old designs and • existing programming languages • to tackle the new data flood.• Green • Footprint • Power usage – running, cooling and start-up
  56. 56. Data Storage• External Data Sources • Lua add-on libraries: LuaCOM and ADOLua• Access: Data Services • MSSQL NCLI (Native Client Interface) • DB2 OLEDB (Object Linking and Embedding) • Oracle OLEDB (Object Linking and Embedding)
  57. 57. Security• Security implemented by the service on the cyte level• Domain-based, inclusion, exclusion• Cyte-to-cyte communication is encrypted• Redundant distribution of cytes provides additional security• Contextual access provides further flexibility for security • Child and parent presentation• Hardware encryption of storage is preferable• Cyte granularity enables • Blind security information retaining associations • Cleansed health data with relationships• DataCyte can integrate with existing authentication / authorization systems (LDAP, Active Directory)
  58. 58. Disaster Recovery• Transaction-based with roll-back• All transactions are Atomic, Consistent, Isolated and Durable (ACID)• 3-state versioning protocol • My old • My new • Yours• provides fine grain control• Balance between performance and integrity mitigation• Each service can partial recover from physical loss• Redundancy could provide complete recovery

×