Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Data science automation consulting seattle


Published on

Data scientist should not be spending their days copy and pasting from excel. Instead, they should be creating algorithms. Automate the boring stuff.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Data science automation consulting seattle

  1. 1. Quick Tip: Data science is more than just algorithms and data cleansing. It is about creating systems that can replicate your findings. Good business practices are key, version control, good documentation and processes can save a team hundreds of hours. It also reduces the probability a data science project fails! Good Luck! AUTOMATION PROCESSING DATA Automation is a key to a data scientist?s success. There is never enough time to manually do all the best practices required to constantly ensure high quality data science solutions. Luckily, most of these processes are repetitive, and have a lot of " ICOULDN'TTELLYOUINANYDETAILHOWMY COMPUTERWORKS.IUSEITWITHALAYEROF AUTOMATION -CONRADWOLFRAM best practices already surrounding them. For instance, from data-warehousing we have ETLs and QA suites. All though they will require some manual intervention and planning up front, they can and should eventually be set in task manager or crontab (or other job scheduler) and only checked periodically. Data science also has the repetitive task of analyzing and classifying basic correlations and data features. Most of this requires the same basic algorithms and graphs and shouldn?t be a manual heavy process. Otherwise, the exploration phase may take months. AND
  2. 2. Data Acquisition Open source data and company data silos have become more prolific over the past decade. This has allowed for companies to take advantage of government data APIs, social media data, etc. This also means that data scientists have the opportunity to search for meaningful relationships in all sorts of data sets. Data Quality Good data quality means a data scientist can spend less time cleaning data and more time seeking value. It would also be beneficial to audit your data either using internal teams or hiring outside consultants. Data Scalability Data scientists can develop solutions that manifest themselves in many forms. It may be a dashboard, algorithm, etc. However, one concept not always thought about by data scientists is data scalability. Will the data scale? Does the data require manual classification? Then, your system better be automatically classifying rows, and data features. ETL Automation Utilizing scripting languages, SSIS, or other ETL tools, data science teams should limit mannual imports to save up to 5-30 hours a week. QA Automation Consider creating a test suite to automate upper and lower bounds testing, re-slicing and dicing the same data, basic aggregation testing and tracking past data metrics Analysis Automation The early steps in the discovery and analysis stages of data science are pretty similar. It involves using basic clustering algorithms, histograms, and scripts to help detect bias, correlations, and quirks inside the data ?Data! data! data! " he cried impatiently. "I can't make bricks without clay.? ? Arthur Conan Doyle Data Processing Data requires several preparation steps in order to become useful to a data scientist. Below is a diagram that depicts data acquisition from multiple sources, data transformation, QA and analysis. The key is to ensure your processes are both automatic and scalable. We have come across many data sets that make us cringe. Duplicate processes that create the same data that later has to be merged, missing data, and lack of QA and auditing makes it difficult to follow data flows. It can be a fun challenge! However, we don't recommend it.