What is data science? (And what is it not?)

I just left an interview where they asked me the same question. After reading the other 41 answers, I will try to adress a simple and more correct one:

WHAT IS

  1. It is a little bit of a misnomer and a buzz word that media is using to describe everything. However, it’s good to have this dicussion to come into an agreement.

  2. The questions is about Data science. So I will not talk about Data Scientists. Go to What is a data scientist? if you are interested.

  3. The biggest error that I found in most of the answers was some sort of “Data Science is when you are dealing with Big Data, large ammounts of data”. That is not true, Data Science can be applied to a data set with one thousand lines, there is no problem with this.

  4. If we are goig to call as “science” we need to consider the Science and Scientific Method definition. According to this, Data Science is not only about the practical or empirical methods, it needs scientific foundations.

  5. No one talked about the difference between Data and Information.

  6. Data is a raw, unorganized set o things that need to be processed to have a meaning.

  7. Information is when data was processed, organized, structured or presented in a given context so as to make it useful

  8. Based on this, we would have Data science and Information science. Right now, people have a bias to talk about Data science including Information science.

  9. It was clearly being used in a lot of fields for the past years:

  10. Statistics/Mathematics

  11. Business analytics

  12. Market intelligence

  13. Strategic Consulting

  14. Many others…

  15. The craziest part is that you see professionals of these areas updating their resumes with something like “I worked with Data Science…”

  16. The creation of data science in a simple way. Two sides that were not totally connected, but with the new fast paced and technological world would have to merge together:

  17. Statistics/mathematics: formulate proper models to generate insights.

  18. Computer science: make the bridge between the models and the data in a feasible time to come with the result.

  19. Topics/tools that a person neeed to understand or have some knowledge when working with Data Science:

  20. Linear algebra

  21. Non-linear systems

  22. Analytical geometry

  23. Optimization

  24. Calculus

  25. Statistics

  26. Programming language (R, Python, SAS)

  27. Softwares: Excel, SPSS by IBM

  28. General platforms: Watson Anlytics by IBM, Azure Machine Learning, Google Cloud machine learning,

  29. Data visualizations: Power BI, Tableau, R/Python using plotly/ggplot

  30. Machine Learning (supervised, unsupervised and reinforcement learning)

  31. Big Data

  32. Big Data Frameworks (Hadoop and Spark)

  33. Hardware (CPU, GPU, TPU, FPGA, ASIC)

  34. One Picture Worth Ten Thousand Words. The Drew Conway’s Data Science Venn Diagram . The Substantive expertise (or Domain expertise) is the specific knowledge of the area that you are applying Data Science. To know more about the lack substantive expertise in data science: What's Missing in Data Science Talks - As Risky As It Gets

WHAT IS NOT

  1. Machine Learning is not a branch of Data science. Machine Learning originated from Artificial Intelligence. Data science is only using ML as a tool. The reason is that it produces amazing and autonomous results for specific tasks

  2. It’s not the salvation of companies that never measured anything and now want to get insights from their data. “Garbage in, garbage out” Data science will be as good as the data generated on the following years.

  3. Just present data using some Excel charts without any insight about the data.

Posts Em Destaque
Posts Recentes
Arquivo
Procurar por tags
Siga
  • Facebook Basic Square
  • Twitter Basic Square
  • Google+ Basic Square