Almost every aspect of a business is open to data collection, be it web server logs, tweet streams, IoT sensors, online transactions, or some other source. Data science provides techniques to extract useful information from this data and thus generate added value
The question facing every company, startup, and non-profit that wants to attract a community in the modern era is how to use data effectively. Using data effectively requires something different from traditional statistics; namely an interdisciplinary approach that involves elements of statistics, computer science, and domain expertise. Data science is the methodology that synthesises these three domains.
In essence, data science is about the extraction of useful information and knowledge from large volumes of data, in order to improve business decision-making (Provost & Fawcett, 2013). With improved decision making comes improved productivity, market value, and competitive edge. Thus the main goal of data science is to enable data-driven decision making across the whole company, where decisions are based on the analysis of data rather than pure intuition.
Learning objectives of the course
This module provides students with a hands-on introduction to the methods of data science, with an emphasis on applying these methods to solve business problems. By the end of this course, it is expected that students will:
Know how to approach business problems from a data science perspective;
Understand the fundamental principles behind extracting useful knowledge from data;
Understand the core concepts and terminology of machine learning;
Gain hands-on experience with mining data for insights.
Throughout the course, students will also have the opportunity to learn several technical skills:
Python programming and experience with the core libaries for data analysis, visualisation, and modelling.
Working with data: collecting, cleaning, and transforming.
Creating and interpreting descriptive statistics.
Creating and interpreting data visualisations.
Practical experience with machine learning.
Lesson 1.1 - The Big Picture
By the end of this course you will:
• Know how to approach business problems from a data science perspective
• Understand the fundamental principles behind extracting useful knowledge from data
• Gain hands-on experience with mining data for insights
In this course you are going to learn several skills:
• Python programming and core libraries for data analysis, visualisation, and modelling
• Working with data: collecting, cleaning, transforming
• Creating and interpreting descriptive statistics
• Creating and interpreting data visualisations
• Creating statistical models for inference
• Practical machine learning
What is Data Science?
Data science is about the extraction of useful information and knowledge from large volumes of data, in order to improve business decision-making
Is an interdisciplinary subject with 3 key areas:
• Computer science
• Domain expertise
Why is Data Science Important?
In the past, data analysis was typically slow: needed teams of statisticians, analysts etc to explore data manually
Today: volume, velocity, and variety make manual analysis impossible …
… but fast computers and good algorithms allow much deeper analyses than before )
--> data-driven decision making
--> base decisions on analysis of data, not intuition
How is data science performed?
• Iterative process • Non-sequential • Early termination • Established processes, e.g. CRISP-DM (https://bit.ly/1tX6508)
Typical data science work flow
Raw data, little value --> Data exploration --> Model building and analysis --> Reporting, Automation
Lesson 1.2 - Machine Learning
What is machine learning really?
Artificial Intelligence: A program that can sense, reason, act and adapt--> 1950s: creation of first “intelligent” algorithms and programs
Machine Learning: Algorithms whose performance improve as they are exposed to more data over time --> 1980s: statistical models and algorithms that can learn from data
Deep Learning: Subset of machine learning in which multilayered neural networks learn from vast amounts of data --> 2010s: statistical models and algorithms inspired by neurones that can learn from data