<h1><center> PPOL 5203 Data Science I: Foundations <br><br> 
<font color='grey'> Writing/Loading and Previewing Data in Pandas<br><br>
Tiago Ventura</center> <h1> 

---

**In this Notebook we cover**

`Pandas` methods for: 

- Loading data 
- Saving data
- Data Conversion
- Previewing your Pandas DataFrame



## Setup

In this notebook, we will work with the [Fifa World Cup](https://www.kaggle.com/datasets/abecklas/fifa-world-cup/code?select=WorldCupMatches.csv) data set hosted on Kaggle. 

Download the data in our websire or in Kaggle. Then: 

- Save in a folder you can access from this notebook
- Or save in the same folder of the notebook (your working directory)


In [24]:
# import modules
import pandas as pd
import numpy as np

## Data in and Data out in `Pandas`

In our class on file management, we saw how to use connection managament tools in Python (`open()`, `close()`, `with()`) to load data stored locally into our Python environments. That process usually involved accessing a locally stored data row by row, and import the data a nested container (list or dictionary). 

Today, we will see the use of high-level functions from Pandas that facilitate the process of loading data into our Python environment. We will focus on data input and output using pandas, though there are numerous tools in other libraries to help with reading and writing data in various formats.

### `pandas` methods

`pandas` contains a variety of methods for reading in various data types.

|Format Type	|Data Description |	Reader |	Writer| Note |
|:------:|:------:|:------:|:------:|:------:| 
|text	|CSV	|`read_csv`	| `to_csv` |
|text	|JSON	|`read_json`	|`to_json`|
|text	|HTML	|`read_html`	|`to_html`|
|text	|Local clipboard	|`read_clipboard`	|`to_clipboard`|
|binary	|MS Excel	|`read_excel`	|`to_excel`| need the `xlwt` module
|binary	|HDF5 Format	|`read_hdf`	|`to_hdf`|
|binary	|Feather Format	|`read_feather`	|`to_feather`|
|binary	|Parquet Format	|`read_parquet`	|`to_parquet`|
|binary	|Msgpack	|`read_msgpack`	|`to_msgpack`|
|binary	|Stata	|`read_stata`	|`to_stata`|
|binary	|SAS	|`read_sas`	 |
|binary	|Python Pickle Format	|`read_pickle`	|`to_pickle`|
|SQL	|SQL	|`read_sql`	|`to_sql`|
|SQL	|Google Big Query	|`read_gbq`	|`to_gbq`|

Read more about all the input/output methods [here](https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html).

### Data in with `pandas`

As you can see, the purposes of each function is intuitive. For example:

### `pandas.read_csv()`: to open flat files

In [58]:
# read a csv 
d = pd.read_csv("WorldCupMatches.csv")

In [59]:
d.head()

Unnamed: 0,Year,Datetime,Stage,Stadium,City,Home Team Name,Home Team Goals,Away Team Goals,Away Team Name,Win conditions,Attendance,Half-time Home Goals,Half-time Away Goals,Referee,Assistant 1,Assistant 2,RoundID,MatchID,Home Team Initials,Away Team Initials
0,1930,13 Jul 1930 - 15:00,Group 1,Pocitos,Montevideo,France,4,1,Mexico,,4444.0,3,0,LOMBARDI Domingo (URU),CRISTOPHE Henry (BEL),REGO Gilberto (BRA),201,1096,FRA,MEX
1,1930,13 Jul 1930 - 15:00,Group 4,Parque Central,Montevideo,USA,3,0,Belgium,,18346.0,2,0,MACIAS Jose (ARG),MATEUCCI Francisco (URU),WARNKEN Alberto (CHI),201,1090,USA,BEL
2,1930,14 Jul 1930 - 12:45,Group 2,Parque Central,Montevideo,Yugoslavia,2,1,Brazil,,24059.0,2,0,TEJADA Anibal (URU),VALLARINO Ricardo (URU),BALWAY Thomas (FRA),201,1093,YUG,BRA
3,1930,14 Jul 1930 - 14:50,Group 3,Pocitos,Montevideo,Romania,3,1,Peru,,2549.0,1,0,WARNKEN Alberto (CHI),LANGENUS Jean (BEL),MATEUCCI Francisco (URU),201,1098,ROU,PER
4,1930,15 Jul 1930 - 16:00,Group 1,Parque Central,Montevideo,Argentina,1,0,France,,23409.0,0,0,REGO Gilberto (BRA),SAUCEDO Ulises (BOL),RADULESCU Constantin (ROU),201,1085,ARG,FRA


### Exploring Arguments 

`pandas` loading functions are highly customizable. For example, check the documentation of `pandas.read_csv()`

In [30]:
# asking for help
help(pd.read_csv)

Help on function read_csv in module pandas.io.parsers.readers:

read_csv(filepath_or_buffer: 'FilePath | ReadCsvBuffer[bytes] | ReadCsvBuffer[str]', *, sep: 'str | None | lib.NoDefault' = <no_default>, delimiter: 'str | None | lib.NoDefault' = None, header: "int | Sequence[int] | None | Literal['infer']" = 'infer', names: 'Sequence[Hashable] | None | lib.NoDefault' = <no_default>, index_col: 'IndexLabel | Literal[False] | None' = None, usecols=None, squeeze: 'bool | None' = None, prefix: 'str | lib.NoDefault' = <no_default>, mangle_dupe_cols: 'bool' = True, dtype: 'DtypeArg | None' = None, engine: 'CSVEngine | None' = None, converters=None, true_values=None, false_values=None, skipinitialspace: 'bool' = False, skiprows=None, skipfooter: 'int' = 0, nrows: 'int | None' = None, na_values=None, keep_default_na: 'bool' = True, na_filter: 'bool' = True, verbose: 'bool' = False, skip_blank_lines: 'bool' = True, parse_dates=None, infer_datetime_format: 'bool' = False, keep_date_col: 'bool' = F

### Data out with `pandas`

All the same methods provided to load, also exists for converting and writing (locally) Pandas Dataframes. 

For example:

In [31]:
# export as stata file
d.to_stata("worldcupmatches.dta",  version=118)

/var/folders/jy/10_nyhkn3nv_rrbnd8f_fr940000gp/T/ipykernel_50502/3669814853.py:2: InvalidColumnName: 
Not all pandas column names were valid Stata variable names.
The following replacements have been made:

    Home Team Name   ->   Home_Team_Name
    Home Team Goals   ->   Home_Team_Goals
    Away Team Goals   ->   Away_Team_Goals
    Away Team Name   ->   Away_Team_Name
    Win conditions   ->   Win_conditions
    Half-time Home Goals   ->   Half_time_Home_Goals
    Half-time Away Goals   ->   Half_time_Away_Goals
    Assistant 1   ->   Assistant_1
    Assistant 2   ->   Assistant_2
    Home Team Initials   ->   Home_Team_Initials
    Away Team Initials   ->   Away_Team_Initials

If this is not what you expect, please make sure you have Stata-compliant
column names in your DataFrame (strings only, max 32 characters, only
alphanumerics and underscores, no Stata reserved words)

  d.to_stata("worldcupmatches.dta",  version=118)


In [32]:
# load back again
d_stata = pd.read_stata("worldcupmatches.dta")

In [33]:
# see the data
d_stata.head()

Unnamed: 0,index,Year,Datetime,Stage,Stadium,City,Home_Team_Name,Home_Team_Goals,Away_Team_Goals,Away_Team_Name,...,Attendance,Half_time_Home_Goals,Half_time_Away_Goals,Referee,Assistant_1,Assistant_2,RoundID,MatchID,Home_Team_Initials,Away_Team_Initials
0,0,1930,13 Jul 1930 - 15:00,Group 1,Pocitos,Montevideo,France,4,1,Mexico,...,4444.0,3,0,LOMBARDI Domingo (URU),CRISTOPHE Henry (BEL),REGO Gilberto (BRA),201,1096,FRA,MEX
1,1,1930,13 Jul 1930 - 15:00,Group 4,Parque Central,Montevideo,USA,3,0,Belgium,...,18346.0,2,0,MACIAS Jose (ARG),MATEUCCI Francisco (URU),WARNKEN Alberto (CHI),201,1090,USA,BEL
2,2,1930,14 Jul 1930 - 12:45,Group 2,Parque Central,Montevideo,Yugoslavia,2,1,Brazil,...,24059.0,2,0,TEJADA Anibal (URU),VALLARINO Ricardo (URU),BALWAY Thomas (FRA),201,1093,YUG,BRA
3,3,1930,14 Jul 1930 - 14:50,Group 3,Pocitos,Montevideo,Romania,3,1,Peru,...,2549.0,1,0,WARNKEN Alberto (CHI),LANGENUS Jean (BEL),MATEUCCI Francisco (URU),201,1098,ROU,PER
4,4,1930,15 Jul 1930 - 16:00,Group 1,Parque Central,Montevideo,Argentina,1,0,France,...,23409.0,0,0,REGO Gilberto (BRA),SAUCEDO Ulises (BOL),RADULESCU Constantin (ROU),201,1085,ARG,FRA


In [34]:
# to csv
d_stata.to_csv("wordlcupmatches_.csv")

## Practice

Explore the arguments of `pd.read_csv()` methods. Open the `WorldCupMatches.csv` with the following options: 

- using comma as separator, 
- indexing by year, 
- selecting only a smaller set of columns
- open only 10 rows after skipping the first 50
- parsing all dates as datetimes



In [14]:
help(pd.read_csv)

Help on function read_csv in module pandas.io.parsers.readers:

read_csv(filepath_or_buffer: 'FilePath | ReadCsvBuffer[bytes] | ReadCsvBuffer[str]', *, sep: 'str | None | lib.NoDefault' = <no_default>, delimiter: 'str | None | lib.NoDefault' = None, header: "int | Sequence[int] | None | Literal['infer']" = 'infer', names: 'Sequence[Hashable] | None | lib.NoDefault' = <no_default>, index_col: 'IndexLabel | Literal[False] | None' = None, usecols=None, squeeze: 'bool | None' = None, prefix: 'str | lib.NoDefault' = <no_default>, mangle_dupe_cols: 'bool' = True, dtype: 'DtypeArg | None' = None, engine: 'CSVEngine | None' = None, converters=None, true_values=None, false_values=None, skipinitialspace: 'bool' = False, skiprows=None, skipfooter: 'int' = 0, nrows: 'int | None' = None, na_values=None, keep_default_na: 'bool' = True, na_filter: 'bool' = True, verbose: 'bool' = False, skip_blank_lines: 'bool' = True, parse_dates=None, infer_datetime_format: 'bool' = False, keep_date_col: 'bool' = F

In [15]:
# my answer
pd.read_csv("WorldCupMatches.csv", 
            sep = ",", # Separator in the data
            index_col="Year", # Set a variable to the index
            usecols = ["Year", "Stage", "Stadium"], # Only request specific columns
            nrows = 10, # only read in n-rows of the data 
            na_values = "nan",
            skiprows = np.arange(1, 50),
            parse_dates=True, # Parse all date features as datatime
            low_memory=True) # read the file in chunks for lower memory use (useful on large data)

### JSON Data

JSON (short for JavaScript Object Notation) has become one of the most used data formats in Data Science. The main reason is that JSONs are the primary way data gets transfered by HTTP request between web browsers and other applications. So we will see a lot of JSON data when querying APIs. 


Let's see an example of: 

- Saving a DataFrame as JSON
- Loading a JSON in your Python environments

In [9]:
# load the csv
# notice this is a different dataset
d_wc = pd.read_csv("WorldCups.csv") 
d_wc

Unnamed: 0,Year,Country,Winner,Runners-Up,Third,Fourth,GoalsScored,QualifiedTeams,MatchesPlayed,Attendance
0,1930,Uruguay,Uruguay,Argentina,USA,Yugoslavia,70,13,18,590.549
1,1934,Italy,Italy,Czechoslovakia,Germany,Austria,70,16,17,363.000
2,1938,France,Italy,Hungary,Brazil,Sweden,84,15,18,375.700
3,1950,Brazil,Uruguay,Brazil,Sweden,Spain,88,13,22,1.045.246
4,1954,Switzerland,Germany FR,Hungary,Austria,Uruguay,140,16,26,768.607
5,1958,Sweden,Brazil,Sweden,France,Germany FR,126,16,35,819.810
6,1962,Chile,Brazil,Czechoslovakia,Chile,Yugoslavia,89,16,32,893.172
7,1966,England,England,Germany FR,Portugal,Soviet Union,89,16,32,1.563.135
8,1970,Mexico,Brazil,Italy,Germany FR,Uruguay,95,16,32,1.603.975
9,1974,Germany,Germany FR,Netherlands,Poland,Brazil,97,16,38,1.865.753


In [10]:
# let's first see what a json looks like. It is a dictionary!
d_wc.to_json()

'{"Year":{"0":1930,"1":1934,"2":1938,"3":1950,"4":1954,"5":1958,"6":1962,"7":1966,"8":1970,"9":1974,"10":1978,"11":1982,"12":1986,"13":1990,"14":1994,"15":1998,"16":2002,"17":2006,"18":2010,"19":2014},"Country":{"0":"Uruguay","1":"Italy","2":"France","3":"Brazil","4":"Switzerland","5":"Sweden","6":"Chile","7":"England","8":"Mexico","9":"Germany","10":"Argentina","11":"Spain","12":"Mexico","13":"Italy","14":"USA","15":"France","16":"Korea\\/Japan","17":"Germany","18":"South Africa","19":"Brazil"},"Winner":{"0":"Uruguay","1":"Italy","2":"Italy","3":"Uruguay","4":"Germany FR","5":"Brazil","6":"Brazil","7":"England","8":"Brazil","9":"Germany FR","10":"Argentina","11":"Italy","12":"Argentina","13":"Germany FR","14":"Brazil","15":"France","16":"Brazil","17":"Italy","18":"Spain","19":"Germany"},"Runners-Up":{"0":"Argentina","1":"Czechoslovakia","2":"Hungary","3":"Brazil","4":"Hungary","5":"Sweden","6":"Czechoslovakia","7":"Germany FR","8":"Italy","9":"Netherlands","10":"Netherlands","11":"Ger

In [11]:
# you can also save a json by record (row-wise as we learned)
d_wc.to_json(orient="records")

'[{"Year":1930,"Country":"Uruguay","Winner":"Uruguay","Runners-Up":"Argentina","Third":"USA","Fourth":"Yugoslavia","GoalsScored":70,"QualifiedTeams":13,"MatchesPlayed":18,"Attendance":"590.549"},{"Year":1934,"Country":"Italy","Winner":"Italy","Runners-Up":"Czechoslovakia","Third":"Germany","Fourth":"Austria","GoalsScored":70,"QualifiedTeams":16,"MatchesPlayed":17,"Attendance":"363.000"},{"Year":1938,"Country":"France","Winner":"Italy","Runners-Up":"Hungary","Third":"Brazil","Fourth":"Sweden","GoalsScored":84,"QualifiedTeams":15,"MatchesPlayed":18,"Attendance":"375.700"},{"Year":1950,"Country":"Brazil","Winner":"Uruguay","Runners-Up":"Brazil","Third":"Sweden","Fourth":"Spain","GoalsScored":88,"QualifiedTeams":13,"MatchesPlayed":22,"Attendance":"1.045.246"},{"Year":1954,"Country":"Switzerland","Winner":"Germany FR","Runners-Up":"Hungary","Third":"Austria","Fourth":"Uruguay","GoalsScored":140,"QualifiedTeams":16,"MatchesPlayed":26,"Attendance":"768.607"},{"Year":1958,"Country":"Sweden","W

In [27]:
# see dictionary here
d_wc.to_dict(orient='records')

[{'Year': 1930,
  'Country': 'Uruguay',
  'Winner': 'Uruguay',
  'Runners-Up': 'Argentina',
  'Third': 'USA',
  'Fourth': 'Yugoslavia',
  'GoalsScored': 70,
  'QualifiedTeams': 13,
  'MatchesPlayed': 18,
  'Attendance': '590.549'},
 {'Year': 1934,
  'Country': 'Italy',
  'Winner': 'Italy',
  'Runners-Up': 'Czechoslovakia',
  'Third': 'Germany',
  'Fourth': 'Austria',
  'GoalsScored': 70,
  'QualifiedTeams': 16,
  'MatchesPlayed': 17,
  'Attendance': '363.000'},
 {'Year': 1938,
  'Country': 'France',
  'Winner': 'Italy',
  'Runners-Up': 'Hungary',
  'Third': 'Brazil',
  'Fourth': 'Sweden',
  'GoalsScored': 84,
  'QualifiedTeams': 15,
  'MatchesPlayed': 18,
  'Attendance': '375.700'},
 {'Year': 1950,
  'Country': 'Brazil',
  'Winner': 'Uruguay',
  'Runners-Up': 'Brazil',
  'Third': 'Sweden',
  'Fourth': 'Spain',
  'GoalsScored': 88,
  'QualifiedTeams': 13,
  'MatchesPlayed': 22,
  'Attendance': '1.045.246'},
 {'Year': 1954,
  'Country': 'Switzerland',
  'Winner': 'Germany FR',
  'Runners

In [12]:
# save and look in the file
d_wc.to_json("worldcup.json", orient="records")

In [13]:
# load
d = pd.read_json("worldcup.json")

# see 
d.head()

Unnamed: 0,Year,Country,Winner,Runners-Up,Third,Fourth,GoalsScored,QualifiedTeams,MatchesPlayed,Attendance
0,1930,Uruguay,Uruguay,Argentina,USA,Yugoslavia,70,13,18,590.549
1,1934,Italy,Italy,Czechoslovakia,Germany,Austria,70,16,17,363.000
2,1938,France,Italy,Hungary,Brazil,Sweden,84,15,18,375.700
3,1950,Brazil,Uruguay,Brazil,Sweden,Spain,88,13,22,1.045.246
4,1954,Switzerland,Germany FR,Hungary,Austria,Uruguay,140,16,26,768.607


### Data Type Conversion

Pandas also provides methods to convert your data frame in native Python Data structures. Those can be useful tool for accessing your dataframe in different format, for example, as a dictionary or a list. 



In [14]:
# to a dictionary
d_wc.to_dict()

{'Year': {0: 1930,
  1: 1934,
  2: 1938,
  3: 1950,
  4: 1954,
  5: 1958,
  6: 1962,
  7: 1966,
  8: 1970,
  9: 1974,
  10: 1978,
  11: 1982,
  12: 1986,
  13: 1990,
  14: 1994,
  15: 1998,
  16: 2002,
  17: 2006,
  18: 2010,
  19: 2014},
 'Country': {0: 'Uruguay',
  1: 'Italy',
  2: 'France',
  3: 'Brazil',
  4: 'Switzerland',
  5: 'Sweden',
  6: 'Chile',
  7: 'England',
  8: 'Mexico',
  9: 'Germany',
  10: 'Argentina',
  11: 'Spain',
  12: 'Mexico',
  13: 'Italy',
  14: 'USA',
  15: 'France',
  16: 'Korea/Japan',
  17: 'Germany',
  18: 'South Africa',
  19: 'Brazil'},
 'Winner': {0: 'Uruguay',
  1: 'Italy',
  2: 'Italy',
  3: 'Uruguay',
  4: 'Germany FR',
  5: 'Brazil',
  6: 'Brazil',
  7: 'England',
  8: 'Brazil',
  9: 'Germany FR',
  10: 'Argentina',
  11: 'Italy',
  12: 'Argentina',
  13: 'Germany FR',
  14: 'Brazil',
  15: 'France',
  16: 'Brazil',
  17: 'Italy',
  18: 'Spain',
  19: 'Germany'},
 'Runners-Up': {0: 'Argentina',
  1: 'Czechoslovakia',
  2: 'Hungary',
  3: 'Brazil',

In [15]:
# to a numpy array
d_wc.values[0]

array([1930, 'Uruguay', 'Uruguay', 'Argentina', 'USA', 'Yugoslavia', 70,
       13, 18, '590.549'], dtype=object)

In [16]:
# To a nested list (which is a method from numpy)
d_wc.values[0].tolist()

[1930,
 'Uruguay',
 'Uruguay',
 'Argentina',
 'USA',
 'Yugoslavia',
 70,
 13,
 18,
 '590.549']

### Previewing and Describing your data

You just loaded your first dataset in Python. Let's see some useful tools to preview you data. 

#### `pandas.head()` : print first n rows

In [17]:
d_wc.head()

Unnamed: 0,Year,Country,Winner,Runners-Up,Third,Fourth,GoalsScored,QualifiedTeams,MatchesPlayed,Attendance
0,1930,Uruguay,Uruguay,Argentina,USA,Yugoslavia,70,13,18,590.549
1,1934,Italy,Italy,Czechoslovakia,Germany,Austria,70,16,17,363.000
2,1938,France,Italy,Hungary,Brazil,Sweden,84,15,18,375.700
3,1950,Brazil,Uruguay,Brazil,Sweden,Spain,88,13,22,1.045.246
4,1954,Switzerland,Germany FR,Hungary,Austria,Uruguay,140,16,26,768.607


#### `pandas.tail()` : print last n rows

In [18]:
d_wc.tail(10)

Unnamed: 0,Year,Country,Winner,Runners-Up,Third,Fourth,GoalsScored,QualifiedTeams,MatchesPlayed,Attendance
10,1978,Argentina,Argentina,Netherlands,Brazil,Italy,102,16,38,1.545.791
11,1982,Spain,Italy,Germany FR,Poland,France,146,24,52,2.109.723
12,1986,Mexico,Argentina,Germany FR,France,Belgium,132,24,52,2.394.031
13,1990,Italy,Germany FR,Argentina,Italy,England,115,24,52,2.516.215
14,1994,USA,Brazil,Italy,Sweden,Bulgaria,141,24,52,3.587.538
15,1998,France,France,Brazil,Croatia,Netherlands,171,32,64,2.785.100
16,2002,Korea/Japan,Brazil,Germany,Turkey,Korea Republic,161,32,64,2.705.197
17,2006,Germany,Italy,France,Germany,Portugal,147,32,64,3.359.439
18,2010,South Africa,Spain,Netherlands,Germany,Uruguay,145,32,64,3.178.856
19,2014,Brazil,Germany,Argentina,Netherlands,Brazil,171,32,64,3.386.810


#### `pandas.sample()` : get a sample

In [19]:
d_wc.sample(5)

Unnamed: 0,Year,Country,Winner,Runners-Up,Third,Fourth,GoalsScored,QualifiedTeams,MatchesPlayed,Attendance
13,1990,Italy,Germany FR,Argentina,Italy,England,115,24,52,2.516.215
16,2002,Korea/Japan,Brazil,Germany,Turkey,Korea Republic,161,32,64,2.705.197
17,2006,Germany,Italy,France,Germany,Portugal,147,32,64,3.359.439
3,1950,Brazil,Uruguay,Brazil,Sweden,Spain,88,13,22,1.045.246
10,1978,Argentina,Argentina,Netherlands,Brazil,Italy,102,16,38,1.545.791


#### `pandas.info()` : Prints information about a DataFrame


In [20]:
d_wc.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20 entries, 0 to 19
Data columns (total 10 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   Year            20 non-null     int64 
 1   Country         20 non-null     object
 2   Winner          20 non-null     object
 3   Runners-Up      20 non-null     object
 4   Third           20 non-null     object
 5   Fourth          20 non-null     object
 6   GoalsScored     20 non-null     int64 
 7   QualifiedTeams  20 non-null     int64 
 8   MatchesPlayed   20 non-null     int64 
 9   Attendance      20 non-null     object
dtypes: int64(4), object(6)
memory usage: 1.7+ KB


#### `pandas.dtypes` : Atttributed to see data types


In [21]:
d_wc.dtypes

Year               int64
Country           object
Winner            object
Runners-Up        object
Third             object
Fourth            object
GoalsScored        int64
QualifiedTeams     int64
MatchesPlayed      int64
Attendance        object
dtype: object

#### `pandas.describe()` : Summarize all numeric the columns


In [22]:
d_wc.describe()

Unnamed: 0,Year,GoalsScored,QualifiedTeams,MatchesPlayed
count,20.0,20.0,20.0,20.0
mean,1974.8,118.95,21.25,41.8
std,25.582889,32.972836,7.268352,17.218717
min,1930.0,70.0,13.0,17.0
25%,1957.0,89.0,16.0,30.5
50%,1976.0,120.5,16.0,38.0
75%,1995.0,145.25,26.0,55.0
max,2014.0,171.0,32.0,64.0


#### `pandas.describe()` : Summarize a particular column



In [23]:
d_wc["Third"].describe()

count          20
unique         14
top       Germany
freq            3
Name: Third, dtype: object

## Practice

Using the "WorldCups.csv" data, answer the following:


- Which two teams played the last game in the data?
- Which Country hosted more Worldcups editions?
- How many different countries have won a Worldcup?
- What is the range of years to which we have data for?



In [62]:
# Add your response here
d = pd.read_csv("WorldCups.csv")
# last world cup game is always the final. Germany vs Argentina
d.tail(1)
# host
d.Country.describe()
# winner unique
d.Winner.describe()
# range
d.Year.describe()

Unnamed: 0,Year,Country,Winner,Runners-Up,Third,Fourth,GoalsScored,QualifiedTeams,MatchesPlayed,Attendance
0,1930,Uruguay,Uruguay,Argentina,USA,Yugoslavia,70,13,18,590.549
1,1934,Italy,Italy,Czechoslovakia,Germany,Austria,70,16,17,363.000
2,1938,France,Italy,Hungary,Brazil,Sweden,84,15,18,375.700
3,1950,Brazil,Uruguay,Brazil,Sweden,Spain,88,13,22,1.045.246
4,1954,Switzerland,Germany FR,Hungary,Austria,Uruguay,140,16,26,768.607
5,1958,Sweden,Brazil,Sweden,France,Germany FR,126,16,35,819.810
6,1962,Chile,Brazil,Czechoslovakia,Chile,Yugoslavia,89,16,32,893.172
7,1966,England,England,Germany FR,Portugal,Soviet Union,89,16,32,1.563.135
8,1970,Mexico,Brazil,Italy,Germany FR,Uruguay,95,16,32,1.603.975
9,1974,Germany,Germany FR,Netherlands,Poland,Brazil,97,16,38,1.865.753


In [1]:
!jupyter nbconvert _week-6b-pandas_data_loading.ipynb --to html --template classic

[NbConvertApp] Converting notebook _week-6b-pandas_data_loading.ipynb to html
[NbConvertApp] Writing 409314 bytes to _week-6b-pandas_data_loading.html
