Open Beta — all 133 formats free, no limits Learn more

Excel (XLSX/XLS) format for geodata: POI tables and attribute exchange — OSM export to .xlsx

2026-05-1218 min read
tabularexcelxlsxpoiattribute-tableoffice

Microsoft Excel XLSX is the planet's most ubiquitous tabular format, born far from the GIS world in corporate accounting and sales analytics — and exactly because of that it became the key exchange format for POI tables and attribute data between GIS engineers and people who live in pivot tables, charts, and management reports. XLSX is not lines of text like CSV but a ZIP archive with XML inside (Office Open XML, ISO/IEC 29500), supporting multiple sheets in one file, formulas, cell formatting, filters, charts, typed data (numbers, dates, text). XLSX has no native geometry type — coordinates live as ordinary lat / lon numeric columns, and complex geometries (polygons, lines) as WKT strings in a separate column. On osm2cdr.ru excel_exporter.py pulls PostGIS POIs/buildings/roads through the WKB fast-path, feeds them into Python pandas, then openpyxl writes a multi-sheet .xlsx with separate sheets per object type, a stats sheet, and a README sheet describing CRS, export date, and columns.

Map generation of Sochi city center

History: from Multiplan 1982 to ISO/IEC 29500

Excel is Microsoft's fourth attempt at the spreadsheet market. The first was Multiplan (1982) for CP/M and MS-DOS — it lost the war to Lotus 1-2-3. In September 1985 Microsoft shipped Excel 1.0 exclusively for the Apple Macintosh — a conceptually new application with a graphical interface, mouse, and pull-down menus, against the clunky text-mode Lotus 1-2-3 on DOS. The Windows version followed in November 1987 (Excel 2.0 for Windows 2.0), but the real breakthrough came with Excel 3.0 (1990) and especially Excel 5.0 (1993) — the latter brought VBA macros and multi-sheet workbooks and became the corporate finance standard. By the late 1990s Lotus 1-2-3 was strangled under the Microsoft Office bundle and Excel won the de facto monopoly.

From Excel 97 through Excel 2003 the format was binary — BIFF (Binary Interchange File Format), extension .xls. BIFF is a proprietary binary format with record-based structure, very compact but completely opaque to outside tools. Opening .xls without Microsoft Office was a real challenge: libraries like xlrd / libxls appeared and died through reverse engineering. Worse, Microsoft regularly added new record types (BIFF5 → BIFF8), breaking third-party reader backward compatibility.

In 2006 Microsoft, under pressure from EU competition authorities and the open-standards movement, published Office Open XML (OOXML) — an XML-based replacement for the binary Word / Excel / PowerPoint formats. In December 2006 ECMA ratified it as ECMA-376, and in November 2008 ISO/IEC 29500. Excel 2007 (for Windows) and Excel 2008 (for Mac) made .xlsx the default format. The old .xls stayed as legacy for compatibility with existing pipelines.

XLSX is a ZIP archive with XML files inside. If you rename .xlsx to .zip and unpack with any archiver, you'll see the structure: [Content_Types].xml, _rels/.rels, docProps/core.xml, docProps/app.xml, xl/workbook.xml, xl/styles.xml, xl/sharedStrings.xml, xl/worksheets/sheet1.xml, xl/worksheets/sheet2.xml, and so on. That's a huge advantage for tools: no need to reverse-engineer the binary anymore — you just read XML with a standard parser. Python libraries openpyxl (for xlsx) and xlsxwriter (for writing) grew exactly on that openness.

ISO/IEC 29500 splits into four parts: 29500-1 (Fundamentals & Markup), 29500-2 (Open Packaging Conventions), 29500-3 (Markup Compatibility), 29500-4 (Transitional Migration Features). The Transitional part documents "dirty" legacy features for backward compatibility with old binary formats; the Strict part is the clean standard. Excel writes Transitional XML by default — that delivers compatibility with old readers but sometimes surprises linters.

Inside XLSX: ZIP + XML, sheets, lat/lon, WKT in cells

To understand how XLSX lives under the hood, picture a regular ZIP archive. At the root:

[Content_Types].xml      # MIME types for each part
_rels/                   # Relationships between parts
  .rels                  # Top-level: pointer to workbook
docProps/
  core.xml               # Title, creator, dates
  app.xml                # Application metadata
xl/
  workbook.xml           # List of sheets
  styles.xml             # All styles (number formats, colors, fonts)
  sharedStrings.xml      # All unique string values (de-duplicated)
  _rels/
    workbook.xml.rels    # Relationships from workbook to sheets
  worksheets/
    sheet1.xml           # Data of the first sheet
    sheet2.xml           # Data of the second sheet
    ...

Inside sheet1.xml cells look like:

<sheetData>
  <row r="1">
    <c r="A1" t="s"><v>0</v></c>      <!-- t="s" = shared string, index 0 -->
    <c r="B1" t="s"><v>1</v></c>      <!-- index 1 in sharedStrings.xml -->
    <c r="C1" t="s"><v>2</v></c>
  </row>
  <row r="2">
    <c r="A2"><v>1234567890</v></c>   <!-- no t — numeric value -->
    <c r="B2"><v>55.7558</v></c>      <!-- latitude -->
    <c r="C2"><v>37.6173</v></c>      <!-- longitude -->
  </row>
</sheetData>

All string values are stored separately in sharedStrings.xml (de-duplication), and cells reference them by index. That gives compactness for files with repeating strings (e.g. 10,000 POIs with category "cafe" — the word "cafe" is stored once).

Geodata in XLSX without a native geometry type. XLSX doesn't know what Point or Polygon is — it's a format for tabular data. Coordinates live in ordinary numeric cells:

osm_id name amenity lat lon
1234567890 Coffee House cafe 55.7558 37.6173
1234567891 Sberbank bank 55.7520 37.6175
1234567892 Pharmacy 36.6 pharmacy 55.7561 37.6190

For non-trivial geometries — lines, polygons — a WKT column is added as plain text:

osm_id name building geometry
1234567895 GUM yes POLYGON((37.6219 55.7548, 37.6235 55.7548, ...))

Excel can't visualize WKT — for the user it's just a long string in a cell. But QGIS, GeoPandas, ogr2ogr read XLSX and understand that the WKT column carries geometry through the GDAL XLSX driver. The full chain: PostGIS → pandas DataFrame → openpyxl WorkBook → .xlsx file → the user opens in Excel and sees a table, OR opens in QGIS as a vector layer and sees a map.

Multi-sheet — the main XLSX advantage over CSV. A single XLSX file can contain multiple sheets with different schemas. For geo export this is perfect:

  • Sheet 1 (Points): POIs with lat/lon — cafes, banks, pharmacies, stops;
  • Sheet 2 (Buildings): buildings with WKT geometry, height, building type;
  • Sheet 3 (Roads): roads with WKT linestring and highway category;
  • Sheet 4 (Stats): aggregates — count of POIs by category, total building area, road length;
  • Sheet 5 (README): description of CRS, export date, bbox, QGIS import instructions.

In CSV you'd have to export five separate files and pack them into a ZIP — in XLSX it's one file, one document, one "workbook".

Map generation of Samara city center

Excel 2007+ limits. One sheet — max 1,048,576 rows × 16,384 columns. So 1 million POIs barely fits in one sheet; for a million polygons with long WKT strings — it's already preferable to split across sheets or switch to CSV / GeoPackage. Numeric precision — IEEE 754 double, ~15 significant digits. For lat/lon that gives ~1 mm precision at the equator — more than enough for geo tasks.

Use cases: five scenarios

City administration / municipality. A worker at the city beautification department receives an XLSX export of every playground in the city with coordinates, surface type, year built. Opens in Excel, makes a "playground count by district" pivot table, color-codes by year built, exports a chart to PowerPoint for a report to the mayor. No QGIS required — the mayor doesn't have QGIS.

Real estate / developer analytics. An analyst at a development company evaluates 500 potential plots for housing construction — needs to know POI density (shops, schools, hospitals) within 500 m of each. Downloads an XLSX export of district POIs, opens in Excel + Power Query, runs cross-tabulation: for each plot counts nearby cafes, schools, medical facilities, builds a scatter plot for the investment committee presentation. Excel + Power BI is the standard in real estate analytics.

Marketing and CRM enrichment. A marketing team for a restaurant chain plans a targeted SMS campaign to residents within 1 km of each restaurant. From osm2cdr they get an XLSX with coordinates of 50 restaurants plus nearby POI sheets with residential category lat/lon. Merge into CRM (Bitrix24, AmoCRM, Salesforce) — all of them import XLSX natively. No GIS pipeline, no programming.

School / educational task. A geography teacher hands students an XLSX with 200 POIs of the district — assignment: build a "cafe density by neighborhood" bubble chart using a pivot table. Excel is a standard tool in school IT curriculum, kids are familiar with it. XLSX from osm2cdr becomes a ready-to-use teaching artifact with no prep.

Government reporting / heritage cadastre. A heritage authority officer maintains a registry of cultural heritage objects — 300 rows, each with coordinates, year built, protection category. From osm2cdr they get an XLSX export of district historic=* POIs, fill in manually, print on paper for the official report. XLSX is the only format regulators accept alongside PDF.

Strengths and weaknesses

Strengths. Universal in the office world — every business analyst, accountant, and marketer has Microsoft Excel or Google Sheets / LibreOffice Calc. Multi-sheet in one file — you can pack POIs, buildings, roads, stats, README into one "workbook". Native typing — numbers, dates, text are distinguished at the format level, no guessing heuristics like in CSV. Native formatting — you can color-code categories, add conditional formatting (green for open POIs, red for closed). Pivot tables, formulas, charts work right on the data without exporting to other tools. The GDAL XLSX driver is supported since GDAL 2.0+ — QGIS, ogr2ogr read and write XLSX natively. Python openpyxl / xlsxwriter / pandas give full code control.

Weaknesses. No native geometry type — coordinates live as numeric columns or WKT strings, which isn't standardized (QGIS auto-detect doesn't always trigger). File size larger than CSV due to XML overhead — 1M POIs in XLSX ~80-100 MB vs ~150 MB CSV (but zipped CSV → 30 MB, XLSX is already zipped). 1,048,576 rows per sheet — a soft limit, but for big datasets you need to split. Excel on Windows auto-converts long numeric IDs (>11 digits) to scientific notation — 5234567891234 becomes 5.23E+12 on "Open" import (workaround: "Import as Text"). The old binary .xls (BIFF) is limited to 65,536 rows — a critical cap for modern datasets. Excel "stores date as a number" (days since 1900-01-01) — when importing dates from CSV / GeoJSON it's easy to get a 1-day shift due to the leap-year-1900 bug. Not streaming-friendly — reading XLSX usually loads the entire file into memory (openpyxl read-only mode partially relieves this).

Excel vs CSV vs GeoJSON vs Shapefile

Property XLSX (Excel) CSV GeoJSON Shapefile
Container ZIP + XML UTF-8 text UTF-8 text (JSON) binary (.shp+.dbf+.shx)
Multi-sheet Yes (native) No (ZIP with CSVs) No No
Native geometry No (lat/lon or WKT) No (WKT in column) Yes (RFC 7946) Yes (ESRI)
Spatial index No No No .qix sidecar
Typing Native (number/date/text) No (all strings) JSON types Native
Formulas / pivot Native No No No
Formatting Native (colors, fonts) No No No
Max rows 1,048,576 / sheet Unlimited Unlimited 2 GB file
Excel-friendly Native Via BOM/dialect No No
Open in Notepad No (binary) Yes Yes No
Streaming write Via xlsxwriter Yes Only GeoJSON-Seq No

XLSX wins where the recipient is a business analyst with pivot tables or executive with PowerPoint, and several related sheets (POIs + Buildings + Stats + README) are needed in one document. CSV — when the recipient is a Python/R engineer and simplicity / streaming matters. GeoJSON — for web maps and JSON APIs. Shapefile — when the recipient works in ArcGIS or MapInfo.

Software for XLSX

Microsoft Excel (microsoft.com/excel) — flagship by Microsoft. Paid (part of Office 365, $69.99/year personal, $99.99/year Family). Windows and macOS. Opens XLSX in one click. Supports 1M rows × 16K columns, pivot tables, formulas, charts, Power Query (ETL pipeline inside Excel), Power BI integration. For geo — "Insert → 3D Maps" (formerly Power Map) builds a heatmap by lat/lon, and Microsoft Maps chart (via Bing Maps API).

Google Sheets (sheets.google.com) — free, web-based. Imports XLSX via "File → Import". Hard limit — 10 million cells per spreadsheet (~5M rows × 2 columns, or 600K rows × 16 columns). For geo — =GEOCODE() via Apps Script, or embedded Google Maps chart.

LibreOffice Calc (libreoffice.org) — free, open source, on Windows / macOS / Linux. Opens XLSX natively. Supports 1M rows × 16K columns. Pivot tables, formulas, charts — full feature parity with Excel for typical tasks. Some complex formatting may be lost on round-trip.

Apple Numbers — free with macOS / iOS. XLSX import via "File → Open" works, but Numbers uses a different data model (flexible canvas, not grid-of-cells), so complex XLSX with pivot tables / VBA macros may render incorrectly. For simple tabular data — fine.

QGIS (qgis.org) — free desktop GIS. Opens XLSX as a vector layer via the GDAL XLSX driver: "Layer → Add Layer → Add Vector Layer" → pick XLSX file → pick sheet → in layer properties set X (lon) and Y (lat) columns or WKT. Supports reading multiple sheets as separate layers.

Python openpyxl (openpyxl.readthedocs.io) — Python library for XLSX read and write, MIT license. from openpyxl import Workbook; wb = Workbook(); ws = wb.active; ws.append([55.7558, 37.6173]); wb.save('out.xlsx') — basic workflow. Supports read, write, formulas, styles, conditional formatting, charts. Streaming write mode for large files (write_only=True).

Python xlsxwriter (xlsxwriter.readthedocs.io) — Python writer-only library, BSD license. Faster than openpyxl on writes for large files, better support for charts and conditional formatting, but read-only is not supported. On osm2cdr we use xlsxwriter for the writing path — it's 2-3x faster than openpyxl on large datasets.

Python pandas (pandas.pydata.org) — df.to_excel('out.xlsx', sheet_name='POI', index=False) uses openpyxl or xlsxwriter under the hood (engine choice). pd.read_excel('in.xlsx', sheet_name='POI') for reading. Multi-sheet support via pd.ExcelWriter('out.xlsx') as writer: df1.to_excel(writer, sheet_name='Points'); df2.to_excel(writer, sheet_name='Buildings').

OGR (gdal)ogr2ogr -f "XLSX" output.xlsx input.geojson converts GeoJSON to XLSX, or ogr2ogr -f "GeoJSON" output.geojson input.xlsx -oo X_POSSIBLE_NAMES=lon -oo Y_POSSIBLE_NAMES=lat — back. Through the GDAL XLSX driver.

Pipeline in OSM2CDR

Our excel_exporter.py is a LAYERS exporter that reads OSM vector layers (buildings, roads, POIs, water, landuse) from PostGIS via the WKB fast-path, normalizes attributes into tabular form, and writes a multi-sheet XLSX through xlsxwriter with separate sheets per object type.

Export parameters:

  • Geometry modelat-lon (two columns for Point layers) or wkt (single column for arbitrary geometry). Default: lat-lon for POI layers, WKT for buildings / roads / polygons.
  • Sheet splitby_layer (one sheet per object type: Points / Buildings / Roads / Water) or single (everything in one sheet with an added layer column). Default: by_layer.
  • CRS — default WGS 84 (EPSG:4326). Any of 165 supported can be set, coordinates are transformed.
  • Number formatting — long integer IDs are wrapped in text format so Excel doesn't convert them to scientific notation. Default: enabled.
  • Conditional formatting — for the POI sheet we automatically color-code amenity categories (green for food / medical, blue for finance, etc.). Default: enabled, can be disabled.
  • Include stats sheet — a separate sheet with summary stats: POI count by category, total building area, road length. Default: enabled.
  • Include README sheet — a sheet with metadata: bbox, export date, CRS, OSM source, column descriptions. Default: enabled.

Output structure for a POI bbox:

moscow-center-export-2026-05-13.xlsx
├── Sheet "Points"      # 1200 rows × 18 cols: osm_id, name, amenity, lat, lon, ...
├── Sheet "Buildings"   # 350 rows × 12 cols: osm_id, name, building, height, geometry (WKT), ...
├── Sheet "Roads"       # 180 rows × 8 cols: osm_id, name, highway, geometry (WKT), ...
├── Sheet "Water"       # 12 rows × 6 cols
├── Sheet "Stats"       # Aggregates: count POI by amenity, sum building area, ...
└── Sheet "README"      # CRS info, export date, bbox, column descriptions

WKB → pandas → openpyxl pipeline. Step 1: local_osm_loader.py pulls geometries from PostGIS in WKB binary format. Step 2: for Point layers we convert WKB → (x, y) tuple through shapely.wkb.loads(b).coords[0], for Polygon / LineString — shapely.wkb.loads(b).wkt for the WKT string. Step 3: a pandas DataFrame with osm_id, name, amenity, lat, lon columns (or geometry). Step 4: xlsxwriter Workbook → add_worksheet → write_row with named styles for headers and conditional formatting for categories. Step 5: final workbook.close() writes the file to disk.

Excel quirks workaround. Long numeric IDs (osm_id can be 12-13 digits) are wrapped in text format via worksheet.set_column('A:A', 15, text_format) — that forces Excel to read the column as text on open. The export date is written as an ISO-8601 string on the README sheet, not as an Excel date — to avoid the leap-year-1900 bug on cross-version opens. UTF-8 encoding is native to XLSX through XML inside ZIP — no need for BOM like in CSV.

Map generation of Moscow city center

FAQ

XLSX or CSV for POI export — which is better? Depends on the recipient. If the recipient is an Excel analyst / executive / marketer who opens the result in Excel and makes pivot tables — XLSX (multi-sheet, formatting, native types). If the recipient is a Python/R/SQL engineer who imports into pandas / DuckDB / PostgreSQL — CSV (simpler, streaming-friendly, less overhead). If the recipient is a QGIS / ArcGIS user — both work, but GeoJSON or Shapefile are more typed for GIS pipelines.

Will QGIS open my XLSX as a map? Yes, through the GDAL XLSX driver, since QGIS 3.0+. Procedure: Layer → Add Layer → Add Vector Layer → pick .xlsx → pick sheet with data → in layer properties set X (lon) and Y (lat) columns. For multi-sheet files QGIS imports each sheet as a separate layer automatically. A WKT column is also recognized if its name is geometry, wkt, or the_geom.

What to do with old .xls (BIFF)? Excel 2007+ opens .xls through "Compatibility Mode" with limits (65,536 rows, no new features). Better to convert to .xlsx via "Save As → XLSX". In Python code — pandas.read_excel('file.xls', engine='xlrd') (needs the xlrd library version < 2.0; xlrd 2.0+ dropped XLS support). On osm2cdr we only write .xlsx.xls legacy is not supported for new exports.

Can XLSX store polygons with 100,000 vertices? Technically yes — a WKT string in a cell can be up to 32,767 characters (Excel limit per cell). A polygon with 100,000 vertices in WKT is ~2-3 MB of text, which doesn't fit in one cell. Workaround: split the polygon into parts or use a reference to an external GeoJSON. For production geo tasks with hundreds of thousands of vertices — move to GeoPackage or Shapefile.

Why do long numbers break when opened in Excel? Excel on File → Open tries to auto-type each column — numeric values with >11 digits become scientific notation (52345678912345.23E+12). That's a precision loss, not a display issue — the real number is lost. Workaround 1: on our export side — wrap osm_id in text format (enabled by default). Workaround 2: on the Excel side — "Data → Get Data → From File → From Workbook" instead of File → Open, and in Power Query Editor set the column type to "Text".

XLSX vs Google Sheets for geo tasks? XLSX is file-based, can be versioned in Git, works offline, no count limit. Google Sheets is cloud-based, real-time collaboration, easy to share by link. Google Sheets hard limit — 10 million cells per spreadsheet (~600K rows × 16 columns), which is less than Excel (1M × 16K). For big geo datasets — XLSX. For small teams with collaboration — Google Sheets (imports XLSX via File → Import).

Sources

← All articles