The Lesson I Learned From A Disastrous Interview

Before I tell you about the gut-wrenching, vomit-worthy, crawl-into-a-ball-and-die interview experience I had lately, let me back up a little. I was, until recently, a 32 year old Italian-born…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




A Review of the Life Cycle and Methodology of Data Science

There is no doubt whether Data Science is a vast power of technology. In order to work with Data Science, one should be familiar with lifecycles and other things while problems can arise in many cases if there is no basic idea. “Data Scientist” is the hottest profession of 2019 according to job-listing data. According to the Harvard Business Review, the position of data scientist was defined in 2012 as, “The sexiest job of the 21st Century”. What I am trying to write is the Lifecycle and Principles Concept of Data Science.

In this article, you will know:

In Data Science, Lifecycle is interconnected to the Data Science since every field has a life cycle, this is no exception. Usually, 7 steps can be found which is most crucial. To illustrate, Exploring, Data Mining, Data Cleaning, Data Exploration, Model Building, and Data Visualization. Take a close look at Fig.1 where Lifecycle of data science described.

Exploring: For example, it has some points, what is the main purpose of your business? What problem are you trying to solve? What are the results of your business? You need to have a good understanding of your business to use the analysis method. In this step, you have to define them and be ready for the next step.

Data Mining: Data Mining is the next step in Exploration. After addressing the business objective as well as the outcome data must be gathered. Usually, Data Mining is the process of collecting data from various references. Collecting data from accurate sources will take much time, for instance, if your data contains a database then you can simply do SQL query to retrieve the data or manipulate your data by different tools such as Pandas Data Frame. On the other hand, if you are used to some libraries in python like “Beautiful soup” then you can scrap data from any webpages in the case when your dataset seems not to exist.

Data Cleaning: Data cleaning is one of the most complex task in the field of data science. The Internet or anywhere else where data is not in a structured format and much of missing values exist. If there is a missing value, it does not give an accurate score during the predictive model building and therefore data cleaning is required to make it suitable for building a predictive machine learning model through the missing value handle.

Data Exploration: In this step, you may have done data cleaning and preparation so now it is time to do data analysis. Data Exploration is the process of exploring the data to find answers to certain questions or see trends. For example, you can explore your company’s previous year data through various plotting or by creating interactive visualization graphs because it allows you to make some observations of the previous year.

Feature Engineering: Feature engineering is the process of using domain knowledge to extract characteristics from raw data through data mining procedures. These features can be used to develop the performance of machine learning algorithms. Feature engineering itself can be recognized as applied machine learning. In summary, Feature Engineering used to measure the performance of Machine Learning model. Feature Engineering transforms information into things the algorithm can recognize.

Model Building: Predictive modelling is machine learning that ultimately comes down to your data science project. After completing the data exploration, data cleaning, feature engineering steps, the most important work of data science is completed here. The accuracy of a predictive model proves how well the model is performing. This is done by fitting the data in the train test split into the machine learning algorithm.

Data Visualization: Data visualization is an interactive graphical illustration of data. It involves producing images that relate to the data presented in the images. This communication is achieved through a systematic mapping between graphic symbols and data values ​​in the creation of visualization. Data visualization can be done through various visualization techniques which are basically very useful. Many beautiful graphical charts can be created through the Seaborn Library in python that illustrates the data nicely.

2. The Methodology of Data Science

So far we have discussed regarding Data Science Lifecycle. In this section, we will discuss the Methodology of Data Science. Firstly, we will learn what exactly methodology is?. Well, methodology usually is the well-organized, logical analysis of the processes applied to a field of study. Every job has a technique or work step. For example, if I talk about software engineering, there is a step-by-step process in the industry, such as collecting requirements, analysing, designing, coding, testing, and releasing. There is a feature of data science that works in data science in any method exactly. So, In a word, methods are discussed in methodology. Fig.2. Shows the methodology of Data Science.

Fig.2: The methodology of Data Science

As the article is getting bigger, I will try to keep it short and explain briefly. In Fig.2, you have shown that the methodology of Data Science which consists of ten 10 interconnected steps. We already explained something in the Lifecycle section. I will try to write briefly about the components of Data Science Methodology.

Phase1: Business understanding:

Business understanding is extremely crucial in the methodology of Data Science. You need to know the purpose of your business and what the problem is here. For instance, what is the objective of your business and what you want to do?

Phase2: Analytic approach:

Once the business problem has been clearly established, the data scientist can define the analytical approach to solve the issues. For example, if the purpose is to predict the numerical values next the analytic procedure could be described as executing, testing and implementing a regression machine learning model.

Phase3: Data requirements:

The preferred analytic method defines the data specifications. Data specifications are the stage where we recognize the essential data content, formats, and references for beginning data gathering, and we practice this data inside the algorithm of the procedure we preferred.

Phase4: Data collection:

In the beginning data-gathering step, data scientists classify and find the possible data sources — structured, unstructured and semi-structured — relevant to the difficulty field. Data collection is a method of evaluating the discovery of information, information and any variable of interest in an established method that enables the collector to predict answers or tests or evaluate the results of a particular collection.

Phase5: Data understanding:

In the beginning data-understanding step, data scientists will have to understand properly regarding data and data pattern. What type of data is, or what is the co-relation of each data is need to identified in order to build a predictive model.

Phase6: Data preparation:

Data preparation exercises incorporate data cleaning dealing with missing or invalid values, reducing duplicates, formatting accurately, merging data from various references and modifying data into more extra beneficial variables. In this transforming process called Feature Engineering Pipeline.

Phase7: Modelling :

The most important work of data science is done here. This step comes in following the previous steps and here is the predictive model that basically defines all the problems of the business model. How accurate is the problem that is solved by measuring the accuracy of the model? This is done through various machine learning algorithms.

Phase8: Evaluation:

While model development and before deployment, the data scientist assesses the model to know its quality and assure that it accurately and thoroughly presents the business difficulty.

Phase9: Deployment:

Once an adequate model has been produced and is recommended by the business supporters, it is deployed into the comparable analysis environment. However, it is deployed in a restricted way until its efficiency has been fully evaluated.

Phase10: Feedback:

Once the model is deployed and the feedback is collected, the performance of the model can be confirmed. If the feedback is negative then data scientists can try again to increase the accuracy of the model.

To conclude, This article discusses the life cycle of data science in detail. This article also illustrates the methodology of data science and shows the work of each step. Hopefully, this article will be effective for those who are enthusiastic about learning data science.

References:

Follow me at:

Add a comment

Related posts:

He will get his Hump the more he grows and matures you do not want to over feed.Just feed a good quality diet .

nega nareul gwi-rob-hyeot-geo-deun. SOL Sh** Out of Arabic (North Africa) Ahlan 04/30/2007 05:34 PM 4 golden sun. Hiding behind Aides Exposed an Undercover and let the rest by spiderwa… Member since…

How to Improve Your Relationship with Your Doctor

How to Improve Your Relationship with Your Doctor. Hello, lovely you! My goodness, we have been having some glorious weather, finally. Warm, breezy and NO HUMIDITY! I don’t know how much of it we’ll…

UK Lockdown Causes An Increase Of Mental Health Related Issues

In the UK there has been a sharp increase of Mental Health related issues since the beginning of the lockdown in March. The impact of the Coronavirus, Unemployment rate increasing and the Shooting of…