Cartes-fiches

Cartes-fiches 18 Cartes-fiches
Utilisateurs 0 Utilisateurs
Langue Deutsch
Niveau Collège
Crée / Actualisé 28.05.2022 / 28.05.2022
Attribution de licence Non précisé
Lien de web
Intégrer
0 Réponses exactes 18 Réponses textes 0 Réponses à choix multiple
Fermer la fenêtre

What is this thing called data science?

  •  Almost every aspect of a business is open to data collection, be it web server logs, tweet streams, IoT sensors, online transactions, or some other source. Data science provides techniques to extract useful information from this data and thus generate added value
  • The question facing every company, startup, and non-profit that wants to attract a community in the modern era is how to use data effectively. Using data effectively requires something different from traditional statistics; namely an interdisciplinary approach that involves elements of statistics, computer science, and domain expertise. Data science is the methodology that synthesises these three domains.
  • In essence, data science is about the extraction of useful information and knowledge from large volumes of data, in order to improve business decision-making (Provost & Fawcett, 2013). With improved decision making comes improved productivity, market value, and competitive edge. Thus the main goal of data science is to enable data-driven decision making across the whole company, where decisions are based on the analysis of data rather than pure intuition.
Fermer la fenêtre

Learning objectives of the course

This module provides students with a hands-on introduction to the methods of data science, with an emphasis on applying these methods to solve business problems. By the end of this course, it is expected that students will:

  • Know how to approach business problems from a data science perspective;
  • Understand the fundamental principles behind extracting useful knowledge from data;
  • Understand the core concepts and terminology of machine learning;
  • Gain hands-on experience with mining data for insights.

Throughout the course, students will also have the opportunity to learn several technical skills:

  • Python programming and experience with the core libaries for data analysis, visualisation, and modelling.
  • Working with data: collecting, cleaning, and transforming.
  • Creating and interpreting descriptive statistics.
  • Creating and interpreting data visualisations.
  • Practical experience with machine learning.
Fermer la fenêtre

Lesson 1.1 - The Big Picture

By the end of this course you will:

• Know how to approach business problems from a data science perspective

• Understand the fundamental principles behind extracting useful knowledge from data

• Gain hands-on experience with mining data for insights

Fermer la fenêtre

In this course you are going to learn several skills:

• Python programming and core libraries for data analysis, visualisation, and modelling

• Working with data: collecting, cleaning, transforming

• Creating and interpreting descriptive statistics

• Creating and interpreting data visualisations

• Creating statistical models for inference

• Practical machine learning

Fermer la fenêtre

What is Data Science?

Data science is about the extraction of useful information and knowledge from large volumes of data, in order to improve business decision-making

Is an interdisciplinary subject with 3 key areas:

• Statistics

• Computer science 

•  Domain expertise

Fermer la fenêtre

Why is Data Science Important?

In the past, data analysis was typically slow: needed teams of statisticians, analysts etc to explore data manually

Today: volume, velocity, and variety make manual analysis impossible …

… but fast computers and good algorithms allow much deeper analyses than before )

--> data-driven decision making

-->  base decisions on analysis of data, not intuition

Fermer la fenêtre

How is data science performed?

• Iterative process • Non-sequential • Early termination • Established processes, e.g. CRISP-DM (https://bit.ly/1tX6508) 

Fermer la fenêtre

Typical data science work flow

Raw data, little value --> Data exploration --> Model building and analysis --> Reporting, Automation

Fermer la fenêtre

Lesson 1.2 - Machine Learning

What is machine learning really?

Artificial Intelligence: A program that can sense, reason, act and adapt--> 1950s: creation of first “intelligent” algorithms and programs 

Machine Learning: Algorithms whose performance improve as they are exposed to more data over time --> 1980s: statistical models and algorithms that can learn from data

Deep Learning: Subset of machine learning in which multilayered neural networks learn from vast amounts of data --> 2010s: statistical models and algorithms inspired by neurones that can learn from data

Fermer la fenêtre

Machine Learning Branches

3 Main Branches:

- Supervised Learning - Unsupervised Learning - Reinforcement Learning

Fermer la fenêtre

Supervised Learning

Attribution de licence: Non précisé

In supervised learning the training data consists of input/output pairs and we train a function to map the inputs to the outputs.

Fermer la fenêtre

Supervised Learning: Classification

Classification: Assign categorical labels from a fixed set of labels to data samples.

Fermer la fenêtre

Supervised Learning: Regression 

Regression: Find the relationship between one dependent variable and one or more input variables.

Fermer la fenêtre

Machine Learning Branches: Unsupervised Learning

In unsupervised learning there are no labels available, insights are gained without* prior knowledge.

* Usually some model parameters need to be set ahead of training.

Fermer la fenêtre

Unsupervised Learning: Anomaly/Outlier detection

Anomaly Detection: The task of finding samples in a dataset that raise suspicion.

Problem: Usually, what exactly you are looking for is unknown.

Solution: Use statistics and characteristics of dataset to find outliers.

Fermer la fenêtre

Deep Learning

Why now? In recent years two things became available:  

1. A lot of data

2. Necessary computational power 

Fermer la fenêtre

What is new in deep learning?

What is new (among other things) is a learning algorithm called backpropagation which allows to train deep neural nets

State-of-the-art networks can have over 200 layers!

Fermer la fenêtre

So why not use Deep Learning for everything?

There are reasons why we don’t only use DL: - Necessary data not available - Computational power not available - Harder to interpret results - Deep networks can be fooled: