forked from crs4/pydoop
-
Notifications
You must be signed in to change notification settings - Fork 0
/
ChangeLog
530 lines (332 loc) · 16.4 KB
/
ChangeLog
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
2012-10-16 simleo <simleo@simleo-U36SD>
* pydoop/hadoop_utils.py (HadoopVersion): improved version string
parsing.
2012-10-08 simleo <simleo@simleo-U36SD>
* setup.py (patch_hadoop_src): patch code moved to this new func.
* pydoop/hadoop_utils.py (HadoopVersion): ext part is now
everything after the first dash and it's not parsed anymore.
(PathFinder.hadoop_version): removed HADOOP_HOME from the
environment passed to the hadoop executable (HADOOP_HOME overrides
the actual Hadoop version to be used even if you call the
executable with its full explicit path).
2012-07-18 simleo <simleo@simleo-U36SD>
* pydoop/hadut.py (PipesRunner): fixed. Now it should be usable
for production jobs.
2012-07-02 simleo <simleo@simleo-U36SD>
* setup.py (PathFinder.hdfs_link): simplified code; now
from-tarball installation has precedence over debian pkg cloudera
installation.
2012-06-15 simleo <simleo@simleo-U36SD>
* pydoop/hdfs/file.py (local_file): changed to create dirname if
necessary. Now the interface should be compatible with that of
hdfs_file.
2012-06-08 simleo <simleo@simleo-U36SD>
* pydoop/hdfs/fs.py: the default_is_local function has replaced
the DEFAULT_IS_LOCAL variable. The function leverages PathFinder's
new hadoop_params method to avoid opening an hdfs connection.
* pydoop/hadoop_utils.py: refactored, cleaned up,
simplified. _EnvMonitor has been removed, now you explicitly reset
your PathFinder object if you need to.
(PathFinder): new hadoop_params method reads Hadoop configuration
parameters into a dictionary.
* pydoop/hadut.py: added support for 'hadoop --config' equivalent.
* pydoop/test_support.py: added PipesRunner.
2012-06-06 <simleo@neuron.crs4.it>
* pydoop/hdfs/fs.py: added DEFAULT_IS_LOCAL. This is a temporary
workaround for bug #346 (instantiating an hdfs object at module
import is not an ideal solution).
2012-06-04 simleo <simleo@simleo-U36SD>
* pydoop/hdfs/fs.py (_get_connection_info): fixed #345.
2012-05-11 simleo <simleo@simleo-U36SD>
* pydoop/hdfs/file.py: added local_file, an extension of the
standard Python file object with the same interface as
hdfs_file. local_file objects can be used for connections to the
local file system to avoid JVM overhead during I/O.
2012-04-27 simleo <simleo@simleo-U36SD>
* pydoop/hadoop_utils.py: changed PathFinder to cache
Hadoop-related variables individually.
2012-04-11 simleo <simleo@simleo-U36SD>
* src/pipes.hpp: wrapped the close() method of closable
components, i.e., all except the partitioner.
2012-03-20 simleo <simleo@simleo-U36SD>
* pydoop/hadoop_utils.py (PathFinder): hadoop_{home,conf,version}
changed to recompute their return values in response to dynamic
changes in os.environ. This is essential for hdfs.config to work
correctly.
2011-06-04 <simleo@neuron.crs4.it>
* setup.py: first step towards support for Hadoop 0.20.203.0:
extension modules compile and can be imported correctly (no
testing yet) after applying hadoop-0.20.203.0.patch to Hadoop's
source code.
2011-05-16 <simleo@neuron.crs4.it>
* pydoop/hdfs.py: HADOOP_HOME is now required.
2011-04-26 <simleo@neuron.crs4.it>
* pydoop/hdfs.py: operations on a closed file or hdfs instance now
raise ValueError.
(hdfs_file.seek): added "whence" parameter.
2011-04-25 <simleo@neuron.crs4.it>
* pydoop/hdfs.py (hdfs_file): added mode property; fixed #337.
2011-04-15 <simleo@neuron.crs4.it>
* pydoop/hdfs.py: added top-level open function.
(hdfs.__init__): changed to store a reference to the actual host
and port.
(hdfs.open_file): changed to accept "r" and "w" opening modes.
(hdfs_file): added __enter__ and __exit__.
2011-04-14 <simleo@neuron.crs4.it>
* pydoop/utils.py (split_hdfs_path): changed to accept a 'user'
parameter.
2011-04-12 <simleo@neuron.crs4.it>
* setup.py (BoostExtension): changed to generate files only when
they are actually needed (non-existent or outdated).
* pydoop/hdfs.py (hdfs_file): added properties to get a reference
to the file's name, size and hdfs instance.
(hdfs_file.read): changed to behave like the read method of a
Python file object, i.e., length can be omitted or negative to
read the whole file.
2011-02-10 <simleo@neuron.crs4.it>
* Makefile (docs_put): changed web upload path to the new value
set by Sourceforge in the Feb 2011 project-web update.
2010-12-17 <simleo@neuron.crs4.it>
* pydoop/hdfs.py (hdfs_file.__read_chunks_until_nl): readline
improved to allow reading more than 20 chunks of data!
2010-11-16 <simleo@neuron.crs4.it>
* setup.py: added more metadata for PyPI.
2010-11-11 <simleo@neuron.crs4.it>
* examples/ipcount/ipcount_base (main): fixed handling of stderr.
2010-11-10 <simleo@neuron.crs4.it>
* docs/examples/intro.rst: added a paragraph on Hadoop 0.21.0 changes.
* examples/sequence_file: added support for Hadoop 0.21.0.
* examples/self_contained: added support for Hadoop 0.21.0.
* examples/ipcount: added support for Hadoop 0.21.0.
2010-11-09 <simleo@neuron.crs4.it>
* examples/input_format: added support for Hadoop 0.21.0.
2010-11-08 <simleo@neuron.crs4.it>
* pydoop/hadoop_utils.py (get_hadoop_version): changed to handle
overriding by env var.
* setup.py (create_pipes_ext): removed include patches (fixed in
Hadoop 0.20.2); updated installation docs accordingly (0.20.1 is
no longer supported).
2010-11-05 <simleo@neuron.crs4.it>
* setup.py: added pipes define macro for new FileSplit; moved
get_hadoop_version to a separate pydoop.hadoop_utils module.
* pydoop/pipes.py (InputSplit): moved here from (now defunct)
input_split.py. Changed to wrap the cpp implementation.
* src/pipes_input_split.cpp (export_input_split): getter methods
changed to properties.
* src/pipes_input_split.hpp (w): fixed string assignment bug.
2010-11-04 <simleo@neuron.crs4.it>
* setup.py (get_hadoop_version): fixed to return a tuple of int.
2010-11-03 <simleo@neuron.crs4.it>
* setup.py (get_hadoop_version): fixed to return version as a tuple.
2010-09-08 <simleo@neuron.crs4.it>
* java_src: removed (examples/input_format is a superset).
* Makefile: added a target to upload docs to sourceforge.
* examples: added an example on custom input formats.
2010-09-01 <simleo@neuron.crs4.it>
* setup.py, src/pipes.hpp: code cleanup.
2010-08-30 <simleo@neuron.crs4.it>
* pydoop/hdfs.py (hdfs_file): added "iterator over its own lines"
behavior -- fixes #324.
2010-08-29 <zag@pflip>
* src/hdfs_fs.hpp (wrap_hdfs_fs): Added namespace 'bp' to
'len'. Unclear why it compiled before.
* src/hdfs_file.cpp (tell): fixed bug ticket:325. A stupid int
instead of a long error.
2010-08-24 <simleo@neuron.crs4.it>
* Makefile (clean): changed to remove $(BUILD_DIR) instead of "build"
2010-08-06 <simleo@neuron.crs4.it>
* setup.py: added platforms and license metadata. Note that the
Makefile (and the copyrighter script it uses) has not been updated
to include the Apache 2.0 notice yet.
* pydoop/utils.py (_HdfsPathSplitter.split): fixed a bug that was
causing it to allow colons in some hdfs paths; error messages now
provide more details on what went wrong.
2010-08-04 <simleo@neuron.crs4.it>
* pydoop/utils.py (_HdfsPathSplitter.split): bad url exception now
reports bad url name.
* pydoop/hdfs.py (hdfs.__init__): user can now be None, with the
same effect as "".
2010-07-29 <simleo@neuron.crs4.it>
* pydoop/utils.py (split_hdfs_path): rewritten to eliminate
urlparse dependency (fixes ticket #316).
2010-07-23 <simleo@neuron.crs4.it>
* pydoop/hdfs.py (hdfs_file.read): fixed #319.
2010-07-20 <simleo@neuron.crs4.it>
* pydoop/hdfs.py (hdfs_file.read): fixed #312.
2010-07-07 <simleo@neuron.crs4.it>
* docs/conf.py: version and release are now dynamically retrieved
from pydoop/__init__.py.
2010-07-05 <simleo@neuron.crs4.it>
* src/hdfs_file.hpp, src/hdfs_file.cpp, pydoop/hdfs.py: added flush.
* src/hdfs_fs.hpp, src/hdfs_fs.cpp, pydoop/hdfs.py: added utime.
2010-07-02 <simleo@neuron.crs4.it>
* src/hdfs_fs.hpp, src/hdfs_fs.cpp, pydoop/hdfs.py: added chown
and chmod.
2010-07-01 <simleo@neuron.crs4.it>
* pydoop/__init__.py (__version__): changed from "0.3.6-dev" to
"0.3.6_rc1". This allows us to build internal releases that are
compliant with Gentoo's file naming rules, see
http://devmanual.gentoo.org/ebuild-writing/file-format/index.html.
The main advantage is that "0.3.6_rc2", ..., "0.3.6_rcN", "0.3.6"
will automatically be considered upgrades by portage.
* pydoop/hdfs.py (hdfs.__init__): changed to match changes in the
underlying C++ constructor.
* src/hdfs_fs.hpp (wrap_hdfs_fs): changed to wrap
hdfsConnectAsUser. It falls back to the old behavior if user is an
empty string and groups an empty list.
2010-06-29 <simleo@neuron.crs4.it>
* setup.py: Removed BoostExtFactory, added BoostExtension. The
latter is a subclass of Extension that can be directly passed to
the setup function as a member of the ext_modules list, with
additional methods to build main files and patched aux files.
Together with the newly added "build_boost_ext" command, this
fixes #191. Note that auto-generated files must still be excluded
in MANIFEST.in because the dist tarball is built after the docs
have been built and these, in turn, require package build.
(create_pipes_ext, create_hdfs_ext): changed module names to fix
#310. Imports in all python modules have been changed accordingly.
2010-06-25 <simleo@neuron.crs4.it>
* pydoop/utils.py: removed __cleanup_file_path, it is no longer used.
(split_hdfs_path): fixed #260.
* examples/text_pipe_runner.py: changed to take pipes executable
as an input parameter.
* docs/conf.py: fixed sys.path manipulation to import from build dir.
* Makefile: refactored to support Sphinx docs; added more targets.
2010-06-24 <simleo@neuron.crs4.it>
* pydoop/hdfs.py (hdfs): added wrappers to all remaining methods,
with docstrings for Sphinx.
2010-02-25 <simleo@neuron.crs4.it>
* src/HadoopPipes_signal.cpp: this is currently experimental, so
it's not distributed and disabled (a 'signal_setup_hack.diff' file
has been added to the root dir that shows how to re-enable it).
2010-02-09 <simleo@neuron.crs4.it>
* src/pipes.hpp: all destructors made virtual.
* src/hdfs_fs.cpp (list_directory): fixed #262.
2010-02-03 <simleo@neuron.crs4.it>
* pydoop: added epydoc markup for main modules.
2010-02-01 <simleo@neuron.crs4.it>
* src/hdfs_fs.cpp (get_path_info): fixed #261. The
'exec_and_trap_error' macro does not work here, error is signaled
by a NULL result, not negative. Also fixed analogous bugs in
list_directory and get_working_directory; unit tests updated
accordingly (except for get_working_directory: how do we get it to
fail on this?)
2009-12-23 <zag@pflip>
* test/test_hdfs_basic_class.py (hdfs_basic_tc.copy_on_self):
added test to check copy on self.
* pydoop/text_protocol.py (text_down_protocol): minor clean up in
__send()
2009-12-15 <zag@pflip>
* setup.py (OLD_WRITE_BUFFER): Fixed a bug (#250) in HadoopPipes
due to a bad use of fprintf. This should be escalated upstream to
hadoop.
2009-12-14 <simleo@neuron.crs4.it>
* src/hdfs_fs.cpp (get_path_info): changed to wrap hdfsGetPathInfo.
Fixes #249.
2009-12-13 <zag@pflip>
* src/pipes_serial_utils.cpp (export_pipes_serial_utils):
added {quote,unquote}_string. Exported via pydoop.utils.
* pydoop/text_protocol.py (text_down_protocol.set_job_conf): Fixed
a stupid % formatting mistake.
2009-12-12 <zag@pflip>
* MANIFEST.in: added pipes_runner support.
2009-11-16 <simleo@neuron.crs4.it>
* pydoop/hdfs.py (hdfs_file.seek): changed to reset
readline-related values. Fixes #239.
2009-10-28 <simleo@neuron.crs4.it>
* examples/bin/wordcount-full.py (WordCountReader): changed to
output file byte offset as key (mimic Hadoop default).
(WordCountWriter): added.
(WordCountPartitioner): added.
2009-10-27 <simleo@neuron.crs4.it>
* README.txt: added section on running unit tests.
* MANIFEST.in: put in sync with what we have and want to release.
* examples: basic python example renamed to wordcount-minimal.py;
added wordcount-full.py, where we want to demonstrate (almost) all
pydoop features. For now it includes a Python RecordReader that
replaces the standard Java line reader.
* pydoop/hdfs.py: added hdfs_file as a wrapper around the
underlying cpp object. Adds the readline method.
2009-10-26 <simleo@neuron.crs4.it>
* pydoop/pipes.py: Task components (Mapper, Reducer etc.)
explicitly redefined as abstract classes (fixes # 236).
Constructors now accept an optional context object that's not
used (__init__ must be explicitly overridden, and the user is
likely to expect the base class to accept a context object too).
2009-10-20 <lsimleo@neuron.crs4.it>
* setup.py: changed to use the pre-compiled libhdfs. NOTE: only
Hadoop 0.20 is supported now.
(create_pipes_ext): changed to generate auxiliary files by copying
them from the Hadoop installation and applying necessary patches.
2009-10-07 <lsimleo@neuron.crs4.it>
* src/hdfs_file.cpp (write): changed to use a reference to the buffer.
* test/test_hdfs_network.py: added tests for block size & replication.
2009-09-28 <zag@manzanillo>
* src/SerialUtils.cpp (HadoopUtils): deserializeFloat put in phase
with hadoop-0.19 header files.
* src/pipes_serial_utils.cpp: Added basic functions to access
hadoop serialization.
2009-09-27 <zag@manzanillo>
* setup.py: Bumped to 0.2.5 (added support for application testing)
* pydoop/pipes_runner.py (pipes_runner): Added a driver for
application testing.
* pydoop/text_protocol.py: Added support to directly talk pipes
textual protocol.
* pydoop/utils.py (make_input_split): Added a function to
build input splits.
2009-09-22 <zag@manzanillo>
* setup.py: Bumped to version 0.2.4
* test/test_utils.py (utils_tc.split): updated to the new
split_hdfs_path.
* pydoop/utils.py (split_hdfs_path): Changed function
internals. Now we follow the python way and use internal
batteries (i.e. urlparse)
2009-09-21 <zag@manzanillo>
* pydoop/utils.py (split_hdfs_path): Added handling of file:/// files.
* pydoop/utils.py (jc_configure_float): added.
* pydoop/pipes.py: now we export {Task,Map,Reduce}Context too.
2009-08-28 <zag@manzanillo.crs4.it>
* setup.py: Bumped to version 0.2.3.
* test/test_utils.py: added some testing.
* pydoop/utils.py: using raise_pydoop_exception to raise what (I
hope) will be a c++ compatible exception. It is probably the wrong thing to do.
* test/test_exceptions.py: added some basic tests for the new exceptions.
* src/exceptions.cpp: Added. Contains translators that prepend the
exception name to the exception message. Better than nothing. We
currently support 'pydoop_exception' and 'pydoop_exception.pipes'.
* src/exceptions.hpp: Added. All our c++ exception are now defined
here. They will all be described as 'exception.UserWarning'
* src/Makefile (machine): replaced machine=`uname -a` with machine=$(shell uname -m)
* src/pipes_context.hpp: Removed horrible_hack. Now we simply do
an incref on the string object. Basically, we are creating a
memory leak inside python :-(. I do not see a clean way around
this.
2009-08-13 <lsimleo@neuron.crs4.it>
* src/pipes.hpp: added Py_REFCNT macro to make it work with Python
2.5.
* test/test_hdfs_basic_class.py: added assertRaises tests for
IOError.
* src/Makefile (PIPES_AUX_FILES): removed hacked_wrapper (the
source file has been removed in an earlier version).
* test/test_factory.py (factory_tc.test_map_reduce_factory):
changed to clean up refs before performing test.
2009-08-10 <lsimleo@neuron.crs4.it>
* src/hdfs_common.cpp: created. Adds hdfs_exception mapping to
Python IOError.
2009-08-09 <zag@manzanillo>
* pydoop/: Clean up.
* src/: All test support stuff in pipes.{c,h}pp has been moved to
pipes_test_support.{c,h}pp
* test/test_factory.py (factory_tc.test_map_reduce_factory):
Preparing to test C++ destruction.
2009-07-08 <lsimleo@neuron.crs4.it>
* setup.py: updated to build hadoop_hdfs. Distutils sucks.
2009-07-07 <lsimleo@neuron.crs4.it>
* setup.py: added auto main generation for boost extensions (as in
src/Makefile). Updated to build pydoop_pipes.
2009-07-06 <lsimleo@neuron.crs4.it>
* src/libhdfs/Makefile: slight generalization.
2009-07-03 <lsimleo@neuron.crs4.it>
* src/{SerialUtils,HadoopPipes,StringUtils}.cpp: fixed includes.
2009-05-14 <lsimleo@neuron.crs4.it>
* pydoop.py (Factory.createMapper): removed debug lines.