Flashcards 18 Flashcards
Students 0 Students
Language Deutsch
Level Secondary School
Created / Updated 28.05.2022 / 28.05.2022
Licencing Not defined
0 Exact answers 18 Text answers 0 Multiple-choice answers


Close window

What is this thing called data science?

  •  Almost every aspect of a business is open to data collection, be it web server logs, tweet streams, IoT sensors, online transactions, or some other source. Data science provides techniques to extract useful information from this data and thus generate added value
  • The question facing every company, startup, and non-profit that wants to attract a community in the modern era is how to use data effectively. Using data effectively requires something different from traditional statistics; namely an interdisciplinary approach that involves elements of statistics, computer science, and domain expertise. Data science is the methodology that synthesises these three domains.
  • In essence, data science is about the extraction of useful information and knowledge from large volumes of data, in order to improve business decision-making (Provost & Fawcett, 2013). With improved decision making comes improved productivity, market value, and competitive edge. Thus the main goal of data science is to enable data-driven decision making across the whole company, where decisions are based on the analysis of data rather than pure intuition.
Close window

Learning objectives of the course

This module provides students with a hands-on introduction to the methods of data science, with an emphasis on applying these methods to solve business problems. By the end of this course, it is expected that students will:

  • Know how to approach business problems from a data science perspective;
  • Understand the fundamental principles behind extracting useful knowledge from data;
  • Understand the core concepts and terminology of machine learning;
  • Gain hands-on experience with mining data for insights.

Throughout the course, students will also have the opportunity to learn several technical skills:

  • Python programming and experience with the core libaries for data analysis, visualisation, and modelling.
  • Working with data: collecting, cleaning, and transforming.
  • Creating and interpreting descriptive statistics.
  • Creating and interpreting data visualisations.
  • Practical experience with machine learning.
Close window

Lesson 1.1 - The Big Picture

By the end of this course you will:

• Know how to approach business problems from a data science perspective

• Understand the fundamental principles behind extracting useful knowledge from data

• Gain hands-on experience with mining data for insights

Close window

In this course you are going to learn several skills:

• Python programming and core libraries for data analysis, visualisation, and modelling

• Working with data: collecting, cleaning, transforming

• Creating and interpreting descriptive statistics

• Creating and interpreting data visualisations

• Creating statistical models for inference

• Practical machine learning

Close window

What is Data Science?

Data science is about the extraction of useful information and knowledge from large volumes of data, in order to improve business decision-making

Is an interdisciplinary subject with 3 key areas:

• Statistics

• Computer science 

•  Domain expertise

Close window

Why is Data Science Important?

In the past, data analysis was typically slow: needed teams of statisticians, analysts etc to explore data manually

Today: volume, velocity, and variety make manual analysis impossible …

… but fast computers and good algorithms allow much deeper analyses than before )

--> data-driven decision making

-->  base decisions on analysis of data, not intuition

Close window

How is data science performed?

• Iterative process • Non-sequential • Early termination • Established processes, e.g. CRISP-DM (https://bit.ly/1tX6508) 

Close window

Typical data science work flow

Raw data, little value --> Data exploration --> Model building and analysis --> Reporting, Automation

Close window

Lesson 1.2 - Machine Learning

What is machine learning really?

Artificial Intelligence: A program that can sense, reason, act and adapt--> 1950s: creation of first “intelligent” algorithms and programs 

Machine Learning: Algorithms whose performance improve as they are exposed to more data over time --> 1980s: statistical models and algorithms that can learn from data

Deep Learning: Subset of machine learning in which multilayered neural networks learn from vast amounts of data --> 2010s: statistical models and algorithms inspired by neurones that can learn from data

Close window

Machine Learning Branches

3 Main Branches:

- Supervised Learning - Unsupervised Learning - Reinforcement Learning

Close window

Supervised Learning

Licencing: Not defined

In supervised learning the training data consists of input/output pairs and we train a function to map the inputs to the outputs.

Close window

Supervised Learning: Classification

Classification: Assign categorical labels from a fixed set of labels to data samples.

Close window

Supervised Learning: Regression 

Regression: Find the relationship between one dependent variable and one or more input variables.

Close window

Machine Learning Branches: Unsupervised Learning

In unsupervised learning there are no labels available, insights are gained without* prior knowledge.

* Usually some model parameters need to be set ahead of training.

Close window

Unsupervised Learning: Anomaly/Outlier detection

Anomaly Detection: The task of finding samples in a dataset that raise suspicion.

Problem: Usually, what exactly you are looking for is unknown.

Solution: Use statistics and characteristics of dataset to find outliers.

Close window

Deep Learning

Why now? In recent years two things became available:  

1. A lot of data

2. Necessary computational power 

Close window

What is new in deep learning?

What is new (among other things) is a learning algorithm called backpropagation which allows to train deep neural nets

State-of-the-art networks can have over 200 layers!

Close window

So why not use Deep Learning for everything?

There are reasons why we don’t only use DL: - Necessary data not available - Computational power not available - Harder to interpret results - Deep networks can be fooled: