The Beginner’s Guide to Data Science

Author(s): Abid Ali Awan

Data Science

Why data science is so attractive, and what things do you need to know before starting your journey into the world of wonders.

Image by starline

Introduction

Hi👋! my name is Abid, and I will be your guide into the world of Data Science. It was an arduous and long journey for me, and I want to make things easy for beginners struggling to get in. Before telling you how to get in, I want to give you a trailer of what you are getting into and why you should go for Data Science.

Why Data Science?

My why is quite personal, and it’s related to my struggles with mental illness. The mental illness in Pakistan is getting worst with time, and one-third of our younger generation is fighting depression, and there is no help available from them; we don’t have the infrastructure for mental illness, which was quite disappointing for me when I was facing the same problem. So, when I recovered, I made it my mission to help the young generation of this world get better help in dealing with mental illness. This is where Data Science and machine learning comes. Ast first, I had no idea how to code or how these things work, but with dedication and time, I became better and better at understanding this world, and recently I drafted an article on AI Product Business Proposal, which is my first step towards building the General AI which will detect mental illness using computer vision, text, and audio. My educational background is quite different from what I am doing now, so I can assure you If I can do it, I think anyone with a working laptop can do it. You need time and dedication, and the doors of this world will open for you.

The second thing that motivated me was the good pay and benefits of a Data Science job, and this world is a hunger for more data engineers, data analysis, and data scientists. If you don’t believe me, you can go to any local or international job posting site and search for Data Scientist.

Credit: Glassdoor

Pay Distribution

With few lines of code and publicly available data on Kaggle, we can see the base pay distribution of Data Scientists.

https://medium.com/media/20809f2b301103c6a4830c567614805a/href

I even found this advertisement for Associate Data Scientist, which is quite attractive if you are from a developing country. Even in my country the pay for Data engineers and Data scientist are three times more than Doctors (source: my both sisters are doctors).

Source: indeed.com

What is Data Science?

Credit: ETH Zurich

Introduction

Data science is the domain of study that deals with vast volumes of data using modern tools and techniques to find unseen patterns, derive meaningful information, and make business decisions. Data science uses machine learning algorithms to build predictive models (simplelearn). In short, you are collecting the data and producing meaningful full information by using scientific tools. The are many subfields of data science, and it’s ever-growing, so without going too deep, we will learn about the advantages and disadvantages of data science to make things easy to grasp.

Image by Author | Elements by Freepik

Advantages

Multiple Job Options: In the world of Data, you can find multiple job opportunities within organizations such as Data Scientist, Data Analyst, Research Analyst, Business Analyst, Analytics Manager, and Big Data Engineer is just the main jobs that are highlighted. Most Artificial Intelligence-related jobs are highly in demand, and every company needs a data analyst or data scientist to help them with business decisions. (northeastern.edu)
Business Benefits: When you are a data scientist in the organization, you are solving real-world business problems, and you are also assisting companies, and other stakeholders make a faster and effective decision on product performance. Some examples for business Benefits: Better Decision-Making, More Accessible Analytics, Automation, Predictive Modeling, Personalization, Retention & Loyalty, Forecasting, Optimized Messaging & Solutions, and Accurate Measure of Campaign ROI. Tiempo Development
Highly Paid Jobs: I have already discussed it, and I think it is the fast lane to becoming millionaires as the average yearly wage is above 100k USD. Most employees gain experience and start their own data company, which eventually turns into a million-dollar business. Business Student.com
AI Products: AI Products are fun to play around with, and you are directly helping communities solve real-world problems. Like crime control, water efficiency, forecasting and designing multipurpose medicine, early detecting cancer, and many more. We are using AI in every walk of life, and it feels wholesome and fun doing our job. You can find more examples on Built In.
Versatile: Let me give you an example, as my expertise is in Natural Language Processing (NLP), and even I don’t know everything about it. NLP is just a small part of this big world, and there is a whole new world within this subfield. You can do text classification, automatic speech recognition, machine translation, text generation, question answer, AI bots, image to text (OCR), and many more. Sometimes I feel overwhelmed. So, you will have endless possibilities to choose your niche and work on it.

Disadvantages

It’s a blurry term: Data Science is a very general term and does not have a clear definition. It’s hard to write down the exact definition of a Data Scientist. For example, a data analyst working on excel can call himself a data scientist. Nowadays, companies use the word data science excessively to market their products, and some companies use the word data scientists, but they actually mean the front end, deployment, analysts, and data engineer, all in one. So, you need to know what companies meant by data science while applying for jobs.
A big world to Navigate: There is an overwhelming number of fields and subfields within Data Science that sometimes it’s scary even to start learning basics as there are so many tutorials available online that teach various things. It’s difficult for an individual with a non-programming background to dive into this world as it is so big to actualize within a week or a month. You need time and effort to get the basics correct, and even then, people are confused about which fields they want to get into and what tools they need to learn. A person with a background in Statistics may not master Computer Science on short notice to become a proficient Data Scientist. Therefore, it is an ever-changing, dynamic field that requires the person to keep learning the various avenues of Data Science. (data-flair.training)
Data Privacy/Data Basis: For many industries, data is their fuel. Data Scientists help companies make data-driven decisions. However, the data utilized in the process may breach the privacy of customers. (data-flair.training) There are so many security variabilities in any Data architect that it’s hard to keep data secure and accessible. If we train models on biased data from humans, it will be bound to produce biased results. For example, in-text generation, we ingest millions of lines of text data that have negative biases towards African Americans, so the Generative AI will generate racist text.
Multiple Domains Required: A data scientist usually has to learn Data Engineering such as MLOps (MLFlow, AirFlow) data pipelines, data ingestion, front-end app, data governance, and streamlining analysis much to learn. It’s just a start, and there are so many subs filed within Data science required by companies during the hiring process, such as time series, unstructured data, NLP, computer vision, and other emerging fields of AI. Sometimes it gets overwhelmed how much a single person needs to learn and keep on learning to remain valuable in the company.
What tools work for you: This is the beginner dilemma of what tools to learn first or what Integrated development environment (IDE) works best for me, or what programming language is popular in data science. I know it's frustrating, and the only way out is to use preloaded cloud IDE. Sometimes you start learning one tool, but the company you are working at prefers different tools such as R or Tableau or Dashboard designing. This is quite disheartening as there is no standard tool in this field.

Final Thoughts

I have talked about my struggles and how I got into data science which is a huge achievement for me, and every day I am getting closer to my goal. In the world of data science, there is a lot of gatekeeping, where experts will ask you to learn advanced math, programming, tools and get an official degree, but these things don’t matter in real life. You can learn from free online courses; you can learn the basics of programming, and you can pick any free cloud IDE such as Google Colab and Deepnote to start working on simple data science projects. If you ask me, the only two things you need are the time and dedication to learn these things.

I started my journey with a paid course on DataCamp and Codecademy. The paid courses make you dedicate yourself to completing the course on time, and they have an interactive coding environment that will help you learn to code fast. These courses also offer you a certificate of completion that you can use in your resume or LinkedIn. Once you got hold of the basics of programming, it’s time for you to start working on simple projects available on the internet, and with time you will find your niche to work on. Don’t forget to share your work on GitHub or Medium, as this will increase your chance of getting hired.

“The only thing that will stop you from fulfilling your dreams is you.” — Tom Bradley.

The Beginner’s Guide to Data Science was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Published via Towards AI