Seven best practices for revolutionizing your data

359 views

Published on

A complete revisiting of the corporate data architecture and its respective best practices is in order because of cloud computing and big changes in computing technology and software development. In some cases, a complete inversion has occurred (in the best way) to solve a particular problem. To be competitive, organizations need to take advantage of these new ways of doing things. Massive data and information is out there if we can just grasp it. Below are some principles and practices on how we can better deal with data going forward.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
359
On SlideShare
0
From Embeds
0
Number of Embeds
96
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Seven best practices for revolutionizing your data

  1. 1. SevenBestPracticesForRevolutionizingYourData A complete revisitingof the corporate dataarchitecture anditsrespective bestpracticesisinorder because of cloudcomputingandbigchangesin computingtechnologyandsoftware development.In some cases,a complete inversionhasoccurred(inthe bestway) tosolve aparticularproblem.Tobe competitive,organizationsneedtotake advantage of these new waysof doingthings.Massive data and informationisoutthere if we canjustgrasp it. Below are some principlesandpracticesonhow we can betterdeal withdatagoingforward. Store First,Analyze Later:Diskischeap. We can’t alwayspredictwhatdatawill be importantlater. Store firstand ask questionslater.Withscalable infrastructure andtoday’shardware economics,it’s okayif a piece of data turnsout to neverbe used.The schemaflexibilityof NoSQLtechnology facilitatesthis. Forexample,withacustomerdocument,addingadditional fieldsof informationata laterdate is easyeven if theywere notenvisionedinitially. DefaulttoReal-time:Historically,dataprocessingandanalysishasbeendone viabatchprocessing. We defaultedtobatchprocessingbecause it’scomputationallyefficienthowever,givenMoore’slaw and the passage of time we nowhave much more powerat our disposal.We canaffordto do more workto get real-time answersinsteadof answerstomorrow. NoSQLandfast storage technologies (suchas solidstate disk) make real-time possible. Yourorganizationshoulddeliver recommendations,personalizationandbusinessmetricsimmediately.Defaulttoreal-timeandgoto batch onlywhennecessary. Structure Shouldn’tHoldYouBack: It’seasyto store basic stockinformation –for example (ticker, high,low,close) –inany database. What abouta complete derivative security?How dowe store that inthe database,especiallygiventhatnew securitiesare inventedall the time? A legal contract’s terms? Howdo we store polymorphicinformationordatawe weren’taware of a priori? Historicallyafewmethodshave beenmostcommon:the relational database fordatawithvery precise structuring;completelyunstructureddata(“BLOBs”);andthingsinthe middle,suchas spreadsheets. The lattertwoformatsare mostlyuselessfor integrationintoyourapplications,yet the volume of suchdata is massive. Withthe rise of dynamicdocument-orienteddatamodels(using JSON),semi-structured,complex structured,andpolymorphicdatacan be stored,accessedand organizedjustasefficientlyasthe more rigidlystructureddatathathasbeenindatabases traditionally. AgilityIsKey:The software developmentworldhasmovedfromclassic“waterfall”software developmentlifecyclestomore agile,oriterative,methodologies(forexample,Scrum). These methods’rapiditerationallowsorganizationstodeliverfeaturesandenhancementstoendusers quicklyandeffectively.Toworkthisway,we neednew toolsthatare agile-compatible — version control,continuousintegration,programminglanguageshave adaptedalready. We needsimilar adaptionbythe database if we want to make software developmentnimbleandproductive. NoSQL technologiesfacilitate iterationinthe datamodel muchthe same way as youiterate withyourcode. One Size Doesn’tFitAll:One-size-fits-all isover. Use multiple database technologiesaspartof your standardenterprise technologyplatform. Youwon’twantdozens – that wouldbe far toocomplex – but more than one isoptimal. A goodmodel forthe future isto have three primarytools:an DBMS,
  2. 2. a relational datawarehouse andaNoSQLdatabase. For each projector sub-problem, use whichever tool is best. Augmentwithniche tools(e.g.,atime seriesdatabase) forspecial cases.The above approach ishighlycompatiblewithservice-orientedarchitectures,whichyoushouldbe using. Monolithichub-and-spoke architecturesleadtolate projectsandunchangeable systems. Instead, buildwebserviceswitheachone potentiallyhavingitsowndatabase ordatamart behindit. Go Commodity:The rise of commodityhardware asa viable productionplatformhasmade it possible todeploymulti-node systemsquickly.Newerdatabase technologiesare designedwith commodityserversinmind.Companiesare movingawayfrom“bigiron”serversandembrace this approach.By adoptinga commodityserverdeploymentmodel,thereislessof adependencyon proprietarymechanismsandvendorlock-inisoftenavoided.Findthe sweetspotonthe price- performance curve andbuyserversof that size. Don’tbuy$1k servers,you’ll have toomanyto manage (or evenplugin!) Butdon’tgo toobig either.Manyorganizationsare standardizingon$10k commodityXeon(orAMD) basedserverswithgigabitEthernet. Use SolidState Drives –a Lot: Traditional spinningdiskshave increasedincapacityanddata transfer ratesby a factor of one thousand,yetthe randomi/otimeshave barelybudgedoveradecade. If youare doingany randomI/Oat all,youshoulduse SSDsinstead.CommoditySATA-style SSDscan worksurprisinglywell. Be sure to mirrorthem – theystill fail eventhoughthereare nomovingparts (exceptelectrons!) Reserve20%+of the disk’sspace as un-partitionedtogive the drive roomto optimize randomwritesandavoidexcess“write amplification”. For sequentialI/O,sticktospinningdisks. Thus,use spinningdisksforHadoopbatchprocessingand for backups. Some have predictedeventually99% of data maybe storedonspinningdisksyet99% of accesseswill be happeningonSSDs.Withspinning disksbeingthe mainplace forbackups,thatis conceivable. Source : forbes.com Recommendedby: JonCohn ,CTO, VP IT Architecture https://www.linkedin.com/in/jonacohn joncohn@comcast.net "JonCohn ExtonPA""JonCohn Exton""JonCohnEvolution"

×