Wednesday, February 17, 2016

Yahoo Closes Burbank Office

早晨还是阳光灿烂,午后竟然下起了雨。平添了一股阴郁。

回首几近九年,写几句只言片语,是为记。




Thursday, November 05, 2015

Null Handling in Hadoop Pig Latin

For chararray type, when you load a dataset, PigStorage will convert empty fields to null. So in any relations, you won't find any empty string but only nulls.

However, in the pig script, if you have a constant as '', it is not treated as null.

So '' is not null return true.
'' is null return not true.

If A is a relation immediately after a load, A.$0 == '' will never be true.

If you compose something manually with GENERATE, it will keep the origin.

B = FOREACH A GENERATE $0, $1, ''; -- Will keep the value as empty string
C = FOREACH A GENERATE $0, $2, (chararry) null; -- Will keep the value as null

Sorting for NULLs

NULL is always treated as smallest value, if you do ORDER BY DESC, it will come last. If you do ASC, it comes first.

Friday, November 15, 2013

hadoop 2.2.0 installation resource

General steps:

http://milinda.pathirage.org/hadoop/yarn/2013/09/29/how-to-setup-multi-node-hadoop-20xyarn-cluster.html#references<br />

It refers to:
http://www.javacodegeeks.com/2013/06/setting-up-apache-hadoop-multi-node-cluster.html
http://raseshmori.wordpress.com/2012/10/14/install-hadoop-nextgen-yarn-multi-node-cluster/

<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce.shuffle</value>
  </property>

should be:
<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
  </property>

How to build 64bit native library or simply download from:
http://shaurong.blogspot.com/2013/11/hadoop-220-centos-64-x64.html

It has a detailed process on how to build from hadoop source. This is the only reliable/working steps I have found so far.
Hadoop build is already moved to maven but the hadoop document still shows old steps with ant (of course it won't work).
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/NativeLibraries.html

Really sucks.

Another error you may have:
$ hadoop fs -ls
ls: `.': No such file or directory

How to solve:
hadoop fs -mkdir $USER

RHEL6 mail attachment with uuencode won't work anymore

The old trick of sending out attachement with
uuencode /tmp/myfile myfile | mail -s "Attachemnt" user1@xyz.nowhere
won't work any more after migrated to RHEL6. All the uuencoded message will display as body contents rather than an attachment.

If you have a script like above, for sure you will get impacted:
http://www.linuxquestions.org/questions/linux-newbie-8/uuencode-issue-with-rhel-6-3-a-4175450188/

Redhat seems to have some kind of workaround but it only opens to its registered users.
https://access.redhat.com/site/solutions/104833

The fix is pretty simply but you'll still need to make changes to your so-far-working-well scripts.
mail -a /tmp/myfile -s "Attachement" user1@xyz.nowhere.

The problem seems coming from the headers the new version introduced.
something like:
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit

Old version headers just have:
MIME-Version: 1.0

There might be options to disable the new headers, let me know if anybody find them out.

Tuesday, November 12, 2013

Oracle Byte Length of CLOB type

All functions in DB_LOB for CLOB type operates at character level. This causes inconvenience when you want to do a substrb to limit the string length (especially oracle SQL only takes 4000 bytes for varchar2 type).

Here are 2 pure SQL queres may help you to a bit.

Query to get all complete multi-bytes characters to fill in a SQL varchar2 type
It uses a recursive query to do binary search to get the maximum character length:

with b4000 (id, e, s, len)
as
(select rowid as id, 666 e, 666 s, lengthb(dbms_lob.substr(text, 666, 1)) len from clob_src
union all
select b1.id,
case when b1.len + lengthb(dbms_lob.substr(text, s, e)) <= 4000 then e + s else e end,
case when b1.len + lengthb(dbms_lob.substr(text, s, e)) <= 4000 then s else trunc(s/2) end,
case when b1.len + lengthb(dbms_lob.substr(text, s, e)) <= 4000 then len + lengthb(dbms_lob.substr(text, s, e)) else len end
 from clob_src s1, b4000 b1
where s1.rowid = b1.id and case when b1.len + lengthb(dbms_lob.substr(text, s, e)) <= 4000 then s else trunc(s/2) end >= 1
) select * from b4000

* Oracle only holds 4000 bytes and a UTF-8 character can take up to 6 bytes, that comes the magic number 4000/6 ~= 666. It will be the number of characters we pull out from CLOB each time. When it reaches the 4000 bytes limits, we cut the step by half recursively until the the step reaches to 1.

Query to get bytes length of your CLOB column:

with blen(id, i, len, lenb)
as (
select rowid as id, 0, length(dbms_lob.substr(text, 666, 1)), lengthb(dbms_lob.substr(text, 666, 1)) from clob_tgt
union all
select b.id, i+ 1, len + length(dbms_lob.substr(text, 666, (i+1)*666 + 1)), lenb + lengthb(dbms_lob.substr(text, 666, (i+1)*666 + 1))
from clob_tgt s, blen b
where s.rowid = b.id and len + length(dbms_lob.substr(text, 666, (i+1)*666 + 1)) <= dbms_lob.getlength(text)
) select * from blen

* This should be much simpler to understand, just cut the clob into segments with character length 666 and summary up all of the byte lengths of each segment.





Monday, April 08, 2013

Easy way to generate Fibonacci number with oracle sql

You could use model but there is an easy way


With FB(k1, k2)
AS
(
select 0 k1, 1 k2 from dual
union all
select k2 as k1, k1 + k2 as k2 from fb where greatest(k1, k2) < 100
)
select k1 from fb


K1
0
1
1
2
3
5
8
13
21
34
55
89

The strange thing is that Oracle needs both columns in the where clause. If you only have one (like where k1 < 100), you will get error

Monday, January 23, 2012

simplest cgo program

Have been struggling to make a simple cgo sample to get familiar on how to link c code and go library.

Finally made it working.

cgo_test.go
-------------------------------
package cgo_test

/*
#include

void out() {
printf("Hello world from c\n");
}
*/
import "C"

func Out() {
C.out()
}

--------------------------------------------------------
main.go
--------------------------------------------------------
package main

import "cgo_test"

func main() {
cgo_test.Out()
}

--------------------------------------------------------
Makefile
--------------------------------------------------------
include ../go/src/Make.inc

TARG=cgo_test
CGOFILES=cgo_test.go

CLEANFILES+=main
include ../go/src/Make.pkg

main: install main.go
$(GC) main.go
$(LD) -o $@ main.$O

The most annoy error message I got is something like:
gomake main
main.go:6: cannot refer to unexported name cgo_test.out
main.go:6: undefined: cgo_test.out

It seems that GO only export Initcap names by default.

Saturday, January 21, 2012

VirtualBox Scientific Linux Issues and Resolution

* Change the default screen resolution 1024x768
Resolution:
  1. Install VboxAddtions
  2. Define the new resolution with xrandra & cvt
  3. Add the new mode into /etc/gdm/Init/Default to make it permanent
xrandr --newmode "1440x900" 106.50 1440 1528 1672 1904 900 903 909 934 -hsync +vsync
xrandr --addmode VBOX0 1440x900
xrandr --output VBOX0 --mode 1440x900


* The New IME ibus
For RH5.*, scim is the default IME. Now Ibus takes its position. The problem is that the Ibus is not setup correctly after installation. I've go "No input windows" for all the applications:

Resolution:
  1. Enable the ibus daemon from "System" -> "Preference" -> "Startup Applications", add an entry with command "/usr/bin/ibus-daemon -d"
  2. Add 3 lines into your bash profile (/etc/profile or .bashrc or .bash_profile)
export GTK_IM_MODULE=ibus
export XMODIFIERS=@im=ibus
export QT_IM_MODULE=ibus