Big data Industry Process

Big Data Industry Process – Adil ZEAARAOUI
Big Data Industry Process
Definition:
Big data process is the set of activities: business understanding, data collection, data
exploration, data preprocessing, data mining, model evaluation and deployment; processed
together in order to extract hidden information from a mass of data.
Fig.1: General overview of big data process
Big data process activities:
During my experience in Data Science, i come up to resume the process of big data in the
following steps:
Step1: Understand the business
In this step, we are concerned to:
 Well define the problem and its scope
 Have a clear view of the goal
 Draw the path to the objective

Step2: Collect the data
Import and collect the data from different sources like: RDMS, datalake store,
datawarehouse...etc.
Step3: Understand and explore data
Before any kind of development, we must first explore our dataset. The exploration is
manifesting in :
 Explore features
 Distinguish categorical features from numerical ones
 Do statistical analysis: min, max, mean, standard deviation, variance...etc.
 Visualize data: missing values for each feature, unique values, how values are
distributed…etc.
 Define business important features
Step4 : Pre-process data
This is the important step in big data; it can take up to 90% of the whole process. This step
intends to prepare data before mine it. We must do:
 Correct wrong input values
 Remove missing values
 Fill the rest of missing values
 Discretize continues features
 Remove correlated features
 Normalize features if required
 Remove outliers if necessary
 Etc.
Step4: Develop your model (Data mining)
After building a clean and “ready to process” dataset, it is time to build our model.
 Transform our dataset if required
 Apply our machine-learning algorithm

Step5: Evaluate and deploy the model
Before deployment, we must validate and see how accurate is our model. So we must :
 Evaluate and test the model
 Review and enhance it
 Deploy the model
 Automate the system workflow

Big data Industry Process

Recommended

Recommended

More Related Content

Similar to Big data Industry Process

Similar to Big data Industry Process (20)

Recently uploaded

Recently uploaded (20)

Big data Industry Process