Read and Display Excel File in Flask With Xlrd

Chapter 4. Working with Excel Files

Different the previous chapter'southward data, not all the data in this and the following chapter volition easily import into Python without a little work. This is because some data formats were made to be machine readable, while others, such as the ones we'll look at adjacent, were meant to be interacted with through desktop tools. In this chapter and the adjacent, nosotros will look at two instance file types—Excel files and PDFs—and provide some generic instructions to follow if you run across a different file type.

So far in this book, the solutions you lot've learned about for importing data take been pretty standard. In this chapter, we volition begin to learn about processes which volition vary greatly each fourth dimension you perform them. Although the processes are more difficult, the end goal is the aforementioned: to excerpt the useful data and put information technology into a usable format in Python.

The examples we use in this chapter and the next contain data from UNICEF's 2014 report on The State of the World'south Children. The data is available in PDF and Excel format.

When you lot have to extract data from files in these more difficult formats, you might call back there is someone out in that location who hates you lot, because it can be painful. We assure yous in most cases, the person who generated the file with the data within merely did non identify the importance of also releasing it in a automobile-readable format.

Installing Python Packages

Before we can continue, we demand to learn how to install external Python packages (or libraries). Upwardly until this point, we were using Python libraries that came standard with Python when yous installed information technology. Do y'all remember importing the csv and json packages in Chapter 3? Those were packages in the standard library—they came with your Python installation.

Python comes with a set of frequently used libraries. Because many libraries serve a niche purpose, y'all take to explicitly install them. This is so your computer doesn't get swollen with every Python library available.

Python packages are collected in an online directory called PyPI, which stores the packages along with their metadata and any documentation.

In this chapter, nosotros are looking at Excel files. If you lot visit PyPI in your browser, y'all can search for libraries relating to Excel and run across lists of matching package results you tin download. This is one fashion to explore which packet you should use.

We will be using pip from this point forward to install packages. There are multiple ways to install pip, and you should accept already washed so in Chapter 1 of this volume.

First, we will be evaluating Excel information. Let's install the package to do that— xlrd. To install the package, we use pip in the following mode:

pip install xlrd

To remove the package, nosotros would run the uninstall control:

pip uninstall xlrd

Try installing, uninstalling, and so reinstalling xlrd. It'southward good to get a handle on the pip commands, as you'll be using them throughout this book and your data wrangling career.

Why did nosotros choose xlrd when there are many possible packages? Choosing a Python library is an imperfect process. At that place are dissimilar ways to go about your selection. Don't worry nearly trying to effigy out what is the right library. When you are perfecting your skills and you lot find a couple of options, utilise the library that makes sense to you.

The showtime thing we recommend is searching the Spider web to run into which libraries other people recommend. If you lot search for "parse excel using python", yous will find the xlrd library surfaces at the acme of the search results.

However, the answer is not ever obvious. In Chapter 13, we will learn more than about the choice process when looking into Twitter libraries.

Parsing Excel Files

Sometimes the easiest way to extract data from an Excel sheet is finding a better fashion to get the information. There are times when parsing is not the answer. Before yous commencement parsing, enquire yourself the following questions:

  • Accept you tried to find the information in some other class? Sometimes other forms might exist available from the aforementioned source.

  • Have you tried to use a phone to figure out if the information is available in another form? Bank check out Chapter 6 for more tips.

  • Have you tried to consign the tab or tabs into CSV grade from Excel (or your document reader)? This is a proficient solution if you lot only have a couple of tabs of information or isolated data in one tab on the Excel sheet.

If you have exhausted these options and you still don't have the information yous need, y'all'll need to use Python to parse your Excel file.

Getting Started with Parsing

The library we identified for parsing Excel files is xlrd. This library is office of a series of libraries for working with Excel files in Python.

There are three main libraries for handling Excel files:

xlrd

Reads Excel files

xlwt

Writes and formats Excel files

xlutils

A set of tools for more advanced operations in Excel (requires xlrd and xlwt)

Yous'll need to install each separately if you desire to use them; however, in this affiliate we will only use xlrd. Considering we want to read Excel files into Python, you'll need to make sure you have xlrd installed before continuing:

pip install xlrd

Set your piece of work environment for this Excel file past doing the post-obit (or something similar it, depending on your organizational system):

  1. Create a binder for your Excel piece of work.

  2. Create a new Python file chosen parse_excel.py and put it in the folder you lot created.

  3. Identify the Excel file from the book's repository chosen SOWC 2014 Stat Tables_Table 9.xlsx in the same binder.

From this binder, type the following command in your terminal to run the script from the control line:

python parse_excel.py

By the end of this chapter, we will write a script to parse child labor and marriage data stored in this Excel file.

To start our script, we need to import xlrd and open our Excel workbook in Python. Nosotros store the opened file in the book variable:

            import            xlrd            book            =            xlrd            .            open_workbook            (            'SOWC 2014 Stat Tables_Table 9.xlsx'            )          

Unlike CSVs, Excel books can accept multiple tabs or sheets. To get at our data, we are going to pull out only the sheet with the data we want.

If you take a couple of sheets, you lot could merely estimate at the index, but that won't piece of work if you take lots of sheets. Then, you should know about the command book.sheet_by_name( somename ), where somename is the proper noun of the sheet you want to admission.

Let's check out the names of the sheets we have:

            import            xlrd            volume            =            xlrd            .            open_workbook            (            'SOWC 2014 Stat Tables_Table 9.xlsx'            )            for            sheet            in            volume            .            sheets            ():            print            sheet            .            name          

The sheet that we are looking for is Table ix. And then, let'southward put that into our script:

            import            xlrd            book            =            xlrd            .            open_workbook            (            'SOWC 2014 Stat Tables_Table 9.xlsx'            )            sheet            =            book            .            sheet_by_name            (            'Tabular array 9'            )            print            canvass          

If you run that code, it exits with an error that provides you lot with the following data:

            xlrd            .            biffh            .            XLRDError            :            No            canvass            named            <            'Table 9'            >          

At this point, you might be really confused. The problem lies in the difference between what we meet and what actually exists.

If y'all open upward your Excel workbook and select the name of the sheet by double-clicking it, you will find that in that location is an extra space at the terminate. This space is not visible to users in the browser. In Chapter vii, we will learn how to troubleshoot this in Python. For now, update your code to reverberate the infinite.

Change this line:

            sheet            =            volume            .            sheet_by_name            (            'Tabular array 9'            )          

to this:

            sheet            =            book            .            sheet_by_name            (            'Table 9 '            )          

Now, if we run our script it should work. You will see output like to this:

<xlrd.canvas.Sheet object at 0x102a575d0>

Allow's explore what we can do with a sheet. Add the post-obit subsequently you assign the sheet variable and rerun your script:

            print            dir            (            canvas            )          

In the returned list, y'all'll see a method called nrows. We will use this method to iterate over all rows. If we write print sail.nrows, the total number of rows will be returned.

Try this now:

            print            sheet            .            nrows          

Yous should have gotten back 303. Nosotros need to iterate over each row, which means nosotros demand a for loop. Every bit nosotros learned in "How to Import CSV Data", for loops iterate over items in a list, and so we demand to plow 303 into a list we can iterate over 303 times. To do this, nosotros will use the range function.

With the improver of the range role we can transform 303 into a list our for loop can iterate over, our script should wait like the following:

            import                                    xlrd                                                            book                                    =                                    xlrd            .            open_workbook            (            '            SOWC 2014 Stat Tables_Table 9.xlsx            '            )                                    sheet                                    =                                    book            .            sheet_by_name            (            '            Table 9                        '            )                                                            for                                    i                                    in                                    range            (            sheet            .            nrows            )            :                                    1                                                            print                                    i                                    2          
1

Loops over the index i in range(303), which volition be a list of 303 integers incrementing past i.

2

Outputs i, which will be the numbers from 0 to 302.

From hither, nosotros need to do a lookup on each of the rows to pull out the contents of each row instead of just printing the number. To do a lookup, we will use i as an index reference to take the nth row.

To get each row's values nosotros will apply row_values, which was some other method returned by dir(sheet) before. We tin can see from the row_values documentation that the method expects an alphabetize number and returns the respective row's values. Update your for loop to reverberate this and rerun your script:

            for                                    i                                    in                                    range            (            sheet            .            nrows            )            :                                                            print                                    sheet            .            row_values            (            i            )                                    1          
1

Uses i as the alphabetize to look up the row'southward values. Considering it is in a for loop that spans the length of the sheet, nosotros call this method for each row in our sheet.

When yous run this code, you will encounter a list for each row. The following is a subset of the data you'll encounter:

            [            ''            ,            u            'TABLE 9. CHILD PROTECTION'            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ]            [            ''            ,            ''            ,            u            'TABLEAU 9. PROTECTION DE Fifty            \u2019            ENFANT'            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ]            [            ''            ,            ''            ,            ''            ,            u            'TABLA 9. PROTECCI            \xd3            N INFANTIL'            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ]            [            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ]            [            ''            ,            u            'Countries and areas'            ,            ''            ,            ''            ,            u            'Kid labour (%)+            \n            2005            \u2013            2012*'            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            u            'Kid spousal relationship (%)            \n            2005            \u2013            2012*'            ,            ''            ,            ''            ,            ''            ,            u            'Nativity registration (%)+            \n            2005            \u2013            2012*'            ,            ''            ,            u            'Female genital mutilation/            cut            (            %            )            +\n2002\u20132012            *            ', '', '', '', '', '', u'            Justification            of            wife            chirapsia            (            %            )\n            2005\u20132012            *            ', '', '', '', u'            Violent            bailiwick            (            %            )            +\n2005\u20132012            *            ', '', '', '', '', '', '', '', '', '', '', '', '', '',            ''            ,            ''            ,            ''            ,            ''            ]          

Now that we can see each row, we need to pull out the information nosotros want. To help us determine what information we need and how to go it, it's much easier to open upwards the file in a program for displaying Excel files, such as Microsoft Excel on Windows or Numbers on Mac. If you visit the second tab on the spreadsheet, you will notice quite a few header rows.

Annotation

In our code, nosotros will aim to grab the English text. However, if yous want an extra challenge, try to pull out the French or Spanish headings and countries.

On the second tab, look at the data you lot tin can extract and think nigh how to all-time organize it. We provide ane possible way to practise this hither, but there are many different means using dissimilar data structures.

For this exercise, nosotros will pull out child labor and kid marriage statistics. The following is one way to organize the data—we'll utilize this as an instance to work toward:

            {                                                            u            '            Transitional islamic state of afghanistan            '            :                                    {                                                            '            child_labor            '            :                                    {                                                            '            female            '            :                                    [            9.vi            ,                                    '            '            ]            ,                                    1                                                            '            male            '            :                                    [            xi.0            ,                                    '            '            ]            ,                                                            '            full            '            :                                    [            x.iii            ,                                    '            '            ]            }            ,                                                            '            child_marriage            '            :                                    {                                                            '            married_by_15            '            :                                    [            xv.0            ,                                    '            '            ]            ,                                                            '            married_by_18            '            :                                    [            40.4            ,                                    '            '            ]                                                            }                                                            }            ,                                                            u            '            Albania            '            :                                    {                                                            '            child_labor            '            :                                    {                                                            '            female            '            :                                    [            ix.iv            ,                                    u            '                                    '            ]            ,                                                            '            male person            '            :                                    [            fourteen.four            ,                                    u            '                                    '            ]            ,                                                            '            total            '            :                                    [            12.0            ,                                    u            '                                    '            ]            }            ,                                                            '            child_marriage            '            :                                    {                                                            '            married_by_15            '            :                                    [            0.2            ,                                    '            '            ]            ,                                                            '            married_by_18            '            :                                    [            9.6            ,                                    '            '            ]                                                            }                                                            }            ,                                                            .            .            .                                    }          
1

If yous are viewing the data in Excel, some of these numbers might appear off. This is because Excel volition often round numbers. We are showing the numbers you will find when you use Python to parse the cells.

Tip

Planning what you want the outcome to look similar and writing an case of your data volition salvage yous time as you lot begin coding. Once you have identified how y'all'd like to format your information, you can ask yourself, "What do I need to do adjacent to get there?" This is especially helpful when you lot experience blocked on your next footstep.

There are two Python constructs we are going to utilize to pull the data out. The first method we will use is a nested for loop, which is a for loop inside some other for loop. This is often used when you take 10 rows that comprise y objects. To admission each object you demand a for loop for each row, and so another for loop for each object. We besides used a nested for loop in an instance in Chapter iii.

Nosotros are going to apply a nested for loop to output each cell from each row. This will output the items nosotros saw earlier, where each row was listed.

            for                                    i                                    in                                    xrange            (            canvas            .            nrows            )            :                                                            row                                    =                                    canvass            .            row_values            (            i            )                                    1                                                            for                                    jail cell                                    in                                    row            :                                    2                                                            print                                    cell                                    3          
1

Takes the listing that is each row and saves it to the row variable. This makes our code more readable.

2

Loops over each item in the list, which represents each jail cell for the current row.

3

Outputs the cell value.

If you run your consummate lawmaking with the nested for loop, you will detect your output is not so helpful anymore. That brings us to the second mechanism to explore our Excel file—a counter.

Permit's add a counter to our code and then nosotros can step through the cells and rows to find what we want to pull out. Be careful where you identify the counter—you volition have very different results if you place information technology on the jail cell level versus the row level.

Reset your for loop to await similar the code shown hither:

            count                                    =                                    0                                    for                                    i                                    in                                    xrange            (            sheet            .            nrows            )            :                                                            if                                    count                                    <                                    ten            :                                                            row                                    =                                    canvas            .            row_values            (            i            )                                                            print                                    i            ,                                    row                                    1                                                                                    count                                    +            =                                    ane          
1

Outputs i and the row so we can actually see which row number has which information

Now, if nosotros go back to what we want our final output to look like, what we really need to figure out is where the country names begin. Remember, the state names are the start keys of our output dictionary:

            {            u            'Afghanistan'            :            {            ...            },            u            'Albania'            :            {            ...            },            ...            }          

If you run your script with the counter in it where count < 10, y'all will see from the output that we have not notwithstanding reached the row where the country names start.

Considering nosotros are skipping a few lines to get to the data we are interested in, we are looking to place which row number we will demand to start our data collection. From our previous attempt, we know the land names start past row 10. Only how can we tell where to showtime?

The respond is in the next code example, merely earlier y'all await, endeavor updating the counter to start at the row where the land names start. (At that place are multiple ways to do this, so if your answer is slightly different than what we have in the following code example, that'due south OK.)

Later you identify the proper row number, yous will need to add an if statement to brainstorm pulling out values after that row. This is so nosotros just work with the data below that line.

If you were able to get that working, your code should similar something like this:

            count                                    =                                    0                                                            for                                    i                                    in                                    xrange            (            canvass            .            nrows            )            :                                                            if                                    count                                    <                                    20            :                                    1                                                            if                                    i                                    >            =                                    14            :                                    2                                                            row                                    =                                    sheet            .            row_values            (            i            )                                                            print                                    i            ,                                    row                                                            count                                    +            =                                    i          
1

This line volition iterate through the first twenty rows to identify which row the country names brainstorm on.

2

This if statement starts the output at the point where the land rows appear.

At this point, you should accept output that looks like this:

            14            [            ''            ,            u            'Afghanistan'            ,            u            'Transitional islamic state of afghanistan'            ,            u            'Afganist            \xe1            n'            ,            ten.3            ,            ''            ,            11.0            ,            ''            ,            9.six            ,            ''            ,            15.0            ,            ''            ,            40.4            ,            ''            ,            37.4            ,            ''            ,            u            '            \u2013            '            ,            ''            ,            u            '            \u2013            '            ,            ''            ,            u            '            \u2013            '            ,            ''            ,            u            '            \u2013            '            ,            ''            ,            90.ii            ,            ''            ,            74.four            ,            ''            ,            74.viii            ,            ''            ,            74.i            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ]            15            [            ''            ,            u            'Albania'            ,            u            'Albanie'            ,            u            'Albania'            ,            12.0            ,            u            '  '            ,            fourteen.iv            ,            u            '  '            ,            9.iv            ,            u            '  '            ,            0.two            ,            ''            ,            9.six            ,            ''            ,            98.half dozen            ,            ''            ,            u            '            \u2013            '            ,            ''            ,            u            '            \u2013            '            ,            ''            ,            u            '            \u2013            '            ,            ''            ,            36.4            ,            ''            ,            29.eight            ,            ''            ,            75.1            ,            ''            ,            78.3            ,            ''            ,            71.four            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ]            xvi            [            ''            ,            u            'People's democratic republic of algeria'            ,            u            'Alg            \xe9            rie'            ,            u            'Argelia'            ,            four.7            ,            u            'y'            ,            v.v            ,            u            'y'            ,            iii.9            ,            u            'y'            ,            0.1            ,            ''            ,            ane.8            ,            ''            ,            99.3            ,            ''            ,            u            '            \u2013            '            ,            ''            ,            u            '            \u2013            '            ,            ''            ,            u            '            \u2013            '            ,            ''            ,            u            '            \u2013            '            ,            ''            ,            67.9            ,            ''            ,            87.seven            ,            ''            ,            88.8            ,            ''            ,            86.five            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ,            ''            ]            ....            more          

At present, we demand to turn each row into our dictionary format. This will make the data more meaningful to usa when we try to do other things with information technology in hereafter capacity.

Looking back at our earlier example of how we want our output to be organized, we are going to need a lexicon and we are going to use countries equally keys. To pull out the country names, we will demand to do some indexing.

Let'south add together a lexicon to our code, and then pull out the country name from each row and add together information technology as a cardinal to our dictionary.

Update your for loop to reflect this:

            count                                    =                                    0                                    data                                    =                                    {            }                                    1                                                            for                                    i                                    in                                    xrange            (            sheet            .            nrows            )            :                                                            if                                    count                                    <                                    x            :                                                            if                                    i                                    >            =                                    14            :                                                            row                                    =                                    canvas            .            row_values            (            i            )                                                            state                                    =                                    row            [            one            ]                                    2                                                            information            [            country            ]                                    =                                    {            }                                    3                                                            count                                    +            =                                    ane                                                            print                                    data                                    4          
1

This creates an empty dictionary to store our data.

2

row[1] pulls out the country from each row we iterate over.

3

data[country] adds the country every bit a central to the data dictionary. We set the value to another dictionary, because that is where nosotros are going to store our data in the following steps.

4

This outputs the information, so we can come across what information technology looks like.

At this point, your output should look something like this:

            {            u            'Afghanistan'            :            {},            u            'Republic of albania'            :            {},            u            'Angola'            :            {},            u            'People's democratic republic of algeria'            :            {},            u            'Andorra'            :            {},            u            'Austria'            :            {},            u            'Australia'            :            {},            u            'Antigua and Barbuda'            :            {},            u            'Armenia'            :            {},            u            'Argentina'            :            {}}          

Now, nosotros demand to friction match upwardly each of the values in the rest of the row with the proper values in the spreadsheet, then shop them in our dictionary.

Annotation

Equally you try to pull out all the values and check them confronting your Excel sheet, you will make lots of errors. That is fine and expected. This process should be embraced—it ways you're working your way through the problem.

First, let'southward create an empty version of our information structure where nosotros tin store our data. Permit'south also remove our counter, equally we know that the rows of data commencement at line fourteen. Because we know xrange can take a showtime and finish point, we tin can begin our counting at 14 and end at the stop of the file. Let's take a look at our updated lawmaking:

            data                                    =                                    {            }                                                            for                                    i                                    in                                    xrange            (            14            ,                                    canvass            .            nrows            )            :                                    1                                                            row                                    =                                    sheet            .            row_values            (            i            )                                                            country                                    =                                    row            [            1            ]                                                                                    data            [            country            ]                                    =                                    {                                    2                                                            '            child_labor            '            :                                    {                                    3                                                            '            full            '            :                                    [            ]            ,                                    4                                                            '            male            '            :                                    [            ]            ,                                                            '            female            '            :                                    [            ]            ,                                                            }            ,                                                            '            child_marriage            '            :                                    {                                                            '            married_by_15            '            :                                    [            ]            ,                                                            '            married_by_18            '            :                                    [            ]            ,                                                            }                                                            }                                                            print                                    data            [            '            Afghanistan            '            ]                                    5          
1

We can remove all of our references to the counter and only begin our for loop starting at the 14th row of our sheet. This line begins the loop with a value of xiv, so we automatically skip the lines we don't need for our dataset.

2

This line expands the dictionary to multiple lines to fill up out the other data points.

3

This creates the fundamental child_labor and sets information technology equal to another dictionary.

4

The lexicon has strings to explain each part of the data information technology holds. For each of these keys, the values are lists.

5

This outputs the values associated with the primal Afghanistan.

Our output information for Afghanistan looks like this:

            {            'child_labor'            :            {            'full'            :            [],            'male'            :            [],            'female'            :            []},            'child_marriage'            :            {            'married_by_18'            :            [],            'married_by_15'            :            []}            }          

Permit'due south at present populate the data. Because we have admission to each column of each row using the index, nosotros can populate these lists with the values from the sail. By looking at our sheet and lining up which columns chronicle to which parts of the information, nosotros can update the information dictionary to reflect the post-obit:

                                    data            [            country            ]                                    =                                    {                                                            '            child_labor            '            :                                    {                                                            '            total            '            :                                    [            row            [            4            ]            ,                                    row            [            v            ]            ]            ,                                    1                                                            '            male person            '            :                                    [            row            [            half dozen            ]            ,                                    row            [            7            ]            ]            ,                                                            '            female            '            :                                    [            row            [            viii            ]            ,                                    row            [            9            ]            ]            ,                                                            }            ,                                                            '            child_marriage            '            :                                    {                                                            '            married_by_15            '            :                                    [            row            [            10            ]            ,                                    row            [            11            ]            ]            ,                                                            '            married_by_18            '            :                                    [            row            [            12            ]            ,                                    row            [            13            ]            ]            ,                                                            }                                                            }          
1

Because there are two cells for each of the columns, our code stores both values. Because in this line our child labor totals are the fifth and 6th columns and we know Python is zero-indexed, we want the quaternary and fifth indexes.

When we run our code over again, nosotros get output like this:

            {            'child_labor'            :            {            'female'            :            [            9.six            ,            ''            ],            'male'            :            [            11.0            ,            ''            ],            'full'            :            [            ten.3            ,            ''            ]},            'child_marriage'            :            {            'married_by_15'            :            [            15.0            ,            ''            ],            'married_by_18'            :            [            twoscore.4            ,            ''            ]}}            }          
Warning

Before you lot keep, output a couple of records and check the number in the dictionary. It is easy to stop up ane alphabetize off and ruin the balance of your data.

Finally, to preview our data, we can apply pprint instead of a print statement. In complicated data structures (like dictionaries), this makes information technology a lot easier to review the output. Add the following to the end of your file to preview the data in a formatted fashion:

            import                                    pprint                                    1                                    pprint            .            pprint            (            data            )                                    2          
1

Imports the pprint library. Normally, import statements come at the start of the file, but nosotros are putting it here for simplicity. Later you are done, yous will want to delete these lines, because they are non disquisitional to your script.

2

Passes information to the pprint.pprint() function.

If you roll through your output, you will notice the majority of it looks expert. Simply in that location are a couple of records that seem out of place.

If you look at the spreadsheet, you should note the last row for countries is Republic of zimbabwe. So, we want to expect for when the land is equal to 'Republic of zimbabwe', and exit there. To leave, we add a intermission to our code, which is how nosotros prematurely pause the for loop to keep with the rest of script. Permit'due south add that every bit our stopping point. At the cease of the for loop, add together the following and rerun your code:

                                    if                                    country                                    ==                                    '            Zimbabwe            '            :                                    1                                                            intermission                                    2          
1

If the country is equal to Zimbabwe…

2

Exits out of the for loop.

Warning

After adding the break, did you lot stop up with a NameError: name 'country' is not divers error? If then, cheque your indentation. The if argument should exist indented four spaces to exist in the for loop.

Stepping through code can exist helpful in identifying an consequence. If you need to troubleshoot to figure out what a variable, such as country, is equal to in a for loop, attempt adding print statements within the for loop and watching the values before your script exits with an fault. They will likely give yous a hint as to what is happening.

At this point, our script'southward output matches our end goal. The concluding matter we want to do to our script is to make certain we document it with some comments.

Your script should now await something similar this:

            """     This is a script to parse child labor and kid marriage data.                        1                          The Excel file used in this script tin be found here:         http://www.unicef.org/sowc2014/numbers/ """                                                            import                                    xlrd                                    book                                    =                                    xlrd            .            open_workbook            (            '            SOWC 2014 Stat Tables_Table 9.xlsx            '            )                                                            sheet                                    =                                    book            .            sheet_by_name            (            '            Table ix                        '            )                                                            data                                    =                                    {            }                                    for                                    i                                    in                                    xrange            (            14            ,                                    sheet            .            nrows            )            :                                                            # Offset at 14th row, because that is where the country information begins                        2                                                                                    row                                    =                                    sheet            .            row_values            (            i            )                                                                                    state                                    =                                    row            [            one            ]                                                                                    data            [            state            ]                                    =                                    {                                                            '            child_labor            '            :                                    {                                                            '            total            '            :                                    [            row            [            iv            ]            ,                                    row            [            five            ]            ]            ,                                                            '            male            '            :                                    [            row            [            vi            ]            ,                                    row            [            7            ]            ]            ,                                                            '            female            '            :                                    [            row            [            8            ]            ,                                    row            [            9            ]            ]            ,                                                            }            ,                                                            '            child_marriage            '            :                                    {                                                            '            married_by_15            '            :                                    [            row            [            ten            ]            ,                                    row            [            11            ]            ]            ,                                                            '            married_by_18            '            :                                    [            row            [            12            ]            ,                                    row            [            xiii            ]            ]            ,                                                            }                                                            }                                                                                    if                                    country                                    ==                                    '            Zimbabwe            '            :                                                            break                                                            import                                    pprint                                    pprint            .            pprint            (            information            )                                    3          
1

This is a multiline comment used to mostly draw what is going on in this script.

2

This is a unmarried-line comment to document why we commencement on line 14 and not earlier.

3

Nosotros can and should remove these lines as we motility beyond uncomplicated parsing of data into data analysis.

At this point, we have a like output to the previous chapter's data. In the next chapter, we will take this a step further and parse the same information from a PDF.

Summary

The Excel format is an odd in-between category that is kind of machine readable. Excel files were not meant to exist read by programs, simply they are parsable.

To handle this nonstandard format, nosotros had to install external libraries. At that place are ii ways to find libraries: past looking on PyPI, the Python package index, or by searching for tutorials and how-tos to see what other people take done.

One time you have identified the library you lot want to install, use the pip install command to do it; to uninstall a library, use pip uninstall.

As well learning how to parse Excel using the xlrd library, we besides learned a few new Python programming concepts, which are summarized in Table 4-1.

Tabular array 4-1. New Python programming concepts
Concept Purpose

range and xrange

This turns a number into a consecutive list of numbers. Case: range(three) will output [0, one, ii] .

Counting starts at 0, not 1

This is a computer construct to exist aware of; it occurs throughout programming. It is of import to note when using range, indexing, or slicing.

Indexing and slicing

Use this to pull out a specific subset of a string or list.

Counters

Use this as a tool to command for loops.

Nested for loops

Use when iterating over a information structure within a data structure, such equally a list of lists, a list of dictionaries, or a dictionary of dictionaries.

pprint

pprint is a style to output data into the terminal in a nice format. This is good to use when programming with complicated data structures.

break

Yous can exit from a for loop early by using break. This volition end executing the loop and continue on to the next function of the script.

Commenting

It is of import to continue all your code commented, so you know what is happening for futurity reference.

As yous read on and dig into PDFs, you will larn the importance of exploring alternatives to the data you take or finding alternate ways to locate and find data you need to answer your enquiry questions.

bordenlardboy.blogspot.com

Source: https://www.oreilly.com/library/view/data-wrangling-with/9781491948804/ch04.html

0 Response to "Read and Display Excel File in Flask With Xlrd"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel