PHP CSV Utilities v0.3 Released!
I have been trying to wrap up version 0.3 of my CSV library for over two years now. I had actually intended for this release to be fully documented, but rather than forestall the release further by adding documentation, I decided to just wrap up the features I wanted in this version and just release it. The next version will hopefully have full documentation. For now, I will give you a quick and dirty rundown of the library right here in this blog post.
CSV — a simple, yet complicated format
PCU is a dead simple library I wrote for reading and writing CSV files. PCU is short for PHP CSV Utilities. Because there is no actual CSV format specification, CSV files come in many flavors. In fact, even though CSV is short for “comma-separated values”, CSV files aren’t always separated by commas. It is quite common that tabs, pipes, or even colons are used to separate values. There is also no requirement that values are quoted or escaped by any specific character. You can’t even count on a specific type of line break!
The solution? Dialects!
So, even though CSV is a really simple format, due to it’s extremely open nature, it can be a bit of a pain to work with. PCU aims to solve this problem by using what are called “dialects” (a term I borrowed from Python’s CSV module). Dialects are simple objects that are used to describe the format of a CSV file.
There are two ways to create a dialect (well, technically three, but we’ll get to that later). If you just plan on using the dialect for a one-off task and you don’t intend to read or write in that format again, just instantiate a Csv_Dialect object and be done with it.
<?php
$dialect = new Csv_Dialect(array(
'delimiter' => ',',
'quotechar' => '"',
'lineterminator' => "\n",
'escapechar' => '"',
'quoting' => Csv_Dialect::QUOTE_MINIMAL,
));
The Csv_Dialect object has defaults for each of these parameters, so you can also instantiate a dialect with only those parameters you wish to change.
<?php
$dialect = new Csv_Dialect(array(
'delimiter' => "\t",
'quoting' => Csv_Dialect::QUOTE_ALL,
));
You can also modify those parameters via instance members:
<?php $dialect = new Csv_Dialect; $dialect->delimiter = "\t"; $dialect->quoting = Csv_Dialect::QUOTE_ALL;
Now, there are also situations where you may want to save a dialect for use any time. In this case, you can extend Csv_Dialect and use it over and over again.
<?php
class Csv_Dialect_MyFormat extends Csv_Dialect {
public $delimiter = "\t";
public $quotechar = '"';
public $lineterminator = "\r\n";
public $escapechar = '\\';
public $quoting = self::QUOTE_MINIMAL;
}
The default parameters for Csv_Dialect are, although there is no “standard” CSV format, as standard as I could make them. They are as follows:
- delimiter – comma
- quotechar – double quote
- escapechar – backslash
- lineterminator – “\r\n”
- quoting – Csv_Dialect::QUOTE_NONE (I am probably going to change this to QUOTE_ALL in the next version)
Quoting
Csv_Dialect allows several different types of value-quoting. To specify which type you want, set the “quoting” value to one of these four Csv_Dialect class constants:
- QUOTE_NONE – Do not quote any values.
- QUOTE_ALL – Quote all values.
- QUOTE_MINIMAL – Only quote values that contain special characters such as the delimiter, quote character, etc.
- QUOTE_NONNUMERIC – Only quote values with non-numerical characters in them.
Reading data
So, now that we can properly describe the flavor of CSV we are working with, let’s dig into actually reading some data. PCU is capable of reading files as well as raw data. To read a CSV file, use the Csv_Reader class.
<?php
$dialect = new Csv_Dialect(array(
'quoting' => Csv_Dialect::QUOTE_MINIMAL,
'quotechar' => '"',
'escapechar' => '"',
));
$reader = new Csv_Reader('/path/to/file.csv', $dialect);
foreach ($reader as $row) {
echo $row[0] . ' is the first column.';
}
Csv_Reader implements the SPL iterator interface, which means you can loop over it as if it were an array. You can then access columns using array notation. By default, the resulting row is indexed numerically, but you can change that by calling setHeader().
<?php
$reader = new Csv_Reader('products.csv');
$reader->setHeader(array('id','name','description','price'));
foreach ($reader as $row) {
$name = $row['name'];
$price = $row['price'];
echo "<p>The price of $name is $price</p>";
}
Auto-detecting CSV format
Csv_Reader accepts a file name as the first parameter and a dialect as the second, but the dialect is optional. If a dialect isn’t specified, the reader will do its best to guess the format of the file. You can then retrieve that format by calling getDialect();
<?php
$reader = new Csv_Reader('products.csv');
$dialect = $reader->getDialect();
echo "The delimiter character is: $dialect->delimiter";
Reading raw CSV data
Csv_Reader is used to read CSV data from files. To read raw CSV data from a string, use Csv_Reader_String().
<?php $data = get_csv_data_from_somewhere(); $reader = new Csv_Reader_String($data);
Writing data
PCU is also great for writing CSV files. To write out CSV data, use the Csv_Writer class, which also uses dialects to specify the format you would like.
<?php
$dialect = new Csv_Dialect(array(
'quoting' => Csv_Dialect::QUOTE_MINIMAL
));
$writer = new Csv_Writer('/path/to/products.csv', $dialect);
$data = get_array_of_data_from_somewhere();
$writer->writeRows($data);
// or
foreach ($data as $row) {
$writer->writeRow($row);
}
Reformatting a CSV file
Csv_Writer an accept a Csv_Reader object in place of a file path, allowing extremely easy reformatting of a CSV file.
<?php
$reader = new Csv_Reader('/path/to/inputfile.csv'); // dialect determined automatically
$outputFormat = new Csv_Dialect(array(
'quoting' => Csv_Dialect::QUOTE_ALL,
'quotechar' => '"',
'escapechar' => '"',
'lineterminator' => "\r\n",
));
$writer = new Csv_Writer('/path/to/outputfile.csv', $dialect);
$writer->writeRows($reader);
Enjoy!
So that’s PHP CSV Utilities v0.3! I am happy to fix any bugs you may encounter and I’m open to suggestions for features. And I’m definitely open to contributors. If you are interested in contributing to PCU, send me a patch as well as a description of what it does or fixes and I will happily take a look at it.
Luke, looks very good. Going to download and test the code. I might be able to use it in a project. Will let you know if I find anything useful for feedback.
just downloaded your libraries. it seems the ancestor of Csv_Reader wasnt included, so I added a require_once for it. When i did so, I got an error in AutoDetect.php about passing variables by reference:
File: /var/www/public_html4/hcspry/utils/php-csv/Csv/Reader/AutoDetect.php Line: 172
Error: [2048] Only variables should be passed by reference
The line in questions says:
if ($quote = array_shift(array_flip($quotes))) {
any ideas?
purchase zithromax online
Buy Cheap Zithromax Online
biadyheilla
[url=http://wsuaa.org]Generic propecia[/url]
MythigEffinny
[url=http://sefsa.org]what is generic for zithromax 500mg[/url]
Heceunank
I would recommend going back through your library in full in Php5.2.3. I have encountered multiple php errors and have had to spend some quality time to fix them. The files in question so far: AutoDetect, String, and Reader.
Here are some examples:
1.) Declaration of Csv_Reader_String::initStream() should be compatible with that of Csv_Reader::initStream()
2.) Only variables should be passed by reference: Line 172 of AutoDetect: if ($quote = array_shift(array_flip($quotes)))
As you can see they are actually PHP errors not some implementation error. Error 2 is a known php bug. In PHP 5.0.3 they stopped allowing this behavior.
???????????? ????????? ???????????????? ?????????? ? ???????????????