[Notepad-plus-plus] [notepad-plus - Help] Other encoding formats?

Discussion:

SourceForge.net

2008-08-26 01:26:47 UTC

Read and respond to this message at:
https://sourceforge.net/forum/message.php?msg_id=5195342
By: osakawebbie

I love Notepad++, except for one thing. I work with code for Japan, so some
of the text strings in the code contain Japanese characters. These days more
and more of it is in UTF-8, but sometimes I have to work with something that
is Shift-JIS (an older Japanese character encoding set, the native encoding
on Japanese Windows). That is not a choice in the Format menu, and so the characters
are, of course, garbled. Is there anything I can do?

______________________________________________________________________
You are receiving this email because you elected to monitor this forum.
To stop monitoring this forum, login to SourceForge.net and visit:
https://sourceforge.net/forum/unmonitor.php?forum_id=331754

Idris Samawi Hamid ادريس سماوي حامد

2008-08-27 02:13:28 UTC

Permalink

On Mon, 25 Aug 2008 19:26:47 -0600, SourceForge.net

Post by SourceForge.net
I love Notepad++, except for one thing. I work with code for Japan, so some
of the text strings in the code contain Japanese characters. These days more
and more of it is in UTF-8, but sometimes I have to work with something that
is Shift-JIS (an older Japanese character encoding set, the native encoding
on Japanese Windows). That is not a choice in the Format menu, and so the characters
are, of course, garbled. Is there anything I can do?

The ConvertExt plugin has support for external encodings. I'm not sure how
it works exactly but it may be of help. See the Options menu of the plugin.

Best wishes
Idris

SourceForge.net

2008-08-27 22:55:00 UTC

Permalink

Read and respond to this message at:
https://sourceforge.net/forum/message.php?msg_id=5201479
By: ishamid2

Post by SourceForge.net
I love Notepad++, except for one thing. I work with code for Japan, so
some
of the text strings in the code contain Japanese characters. These days
more
and more of it is in UTF-8, but sometimes I have to work with something
that
is Shift-JIS (an older Japanese character encoding set, the native
encoding
on Japanese Windows). That is not a choice in the Format menu, and so
the characters
are, of course, garbled. Is there anything I can do?

The ConvertExt plugin has support for external encodings. I'm not sure how
it works exactly but it may be of help. See the Options menu of the plugin.

Best wishes
Idris

______________________________________________________________________
You are receiving this email because you elected to monitor this forum.
To stop monitoring this forum, login to SourceForge.net and visit:
https://sourceforge.net/forum/unmonitor.php?forum_id=331754

SourceForge.net

2008-08-28 02:04:55 UTC

Permalink

Read and respond to this message at:
https://sourceforge.net/forum/message.php?msg_id=5201770
By: osakawebbie

Thank you - that was a nice idea. Unfortunately, the ReadMe file revealed that
ConvertExt currently only supports single-byte character sets. I wrote to the
author requesting that he consider adding Shift-JIS, but I might have to wait
a long time...

______________________________________________________________________
You are receiving this email because you elected to monitor this forum.
To stop monitoring this forum, login to SourceForge.net and visit:
https://sourceforge.net/forum/unmonitor.php?forum_id=331754

SourceForge.net

2008-08-28 10:29:51 UTC

Permalink

Read and respond to this message at:
https://sourceforge.net/forum/message.php?msg_id=5202658
By: harrybharry

Notepad++ uses the current systems ANSI codepage to detect what codepage to
use. If you have a shift-JIS codepage set, Notepad++ should instruct Scintilla
to use that codepage to properly render MBCS/DBCS characters.

If your systems codepage is set to something else then it will fail and parse
the text incorrectly.

ConvertExt might be a solution if it can handle DBCS but you still have to convert
everytime you open a Shift-JIS file.
Dunno if its nice if oyu manually have to define Shift-JIS for ConvertExt, methinks
windows should be perfectly able to do it for you :) If oyu badly need something
to convert the current text in Notepad++ to the current system codepage or UTF8/other
unicode I can fix something up quickly, maybe

______________________________________________________________________
You are receiving this email because you elected to monitor this forum.
To stop monitoring this forum, login to SourceForge.net and visit:
https://sourceforge.net/forum/unmonitor.php?forum_id=331754

SourceForge.net

2008-08-28 13:03:00 UTC

Permalink

Read and respond to this message at:
https://sourceforge.net/forum/message.php?msg_id=5202990
By: osakawebbie

I don't think I have any ability to "set" my system's codepage - my understanding
is that Japanese Windows always operates in Shift-JIS except when presented
with something it has to render using something else (for example, regular Notepad
seem to correctly handle whatever I throw at it). But when I open an HTML document
that includes Shift-JIS text in N++, all the Japanese is garbled, so N++ apparently
doesn't render as seamlessly as you are thinking it should. You seem to feel
that I can somehow just manually "define" Shift-JIS for N++, but if so, I have
no idea how to do that - it is not in the list of choices on the Format menu.

ConvertExt does not handle any MBCS at all. I wrote to the author about whether
he might add Shift-JIS (or DBCS in general) to it, and he replied just now.
It turns out that he lost the source code for ConvertExt, so he started over
from scratch - the new one is called Encodings. But he is tired of working
on it and has given up on it at a "pre-Alpha" stage. The story is here:
http://sourceforge.net/forum/forum.php?thread_id=2115650&forum_id=672146
He is hoping that someone else will get interested in it and finish it, but
so far there has been no interest. I don't even have a C++ compiler (my C experience
was over 20 years ago), and I have negative spare time (I'm losing ground on
my obligations as it is), so I'm not the right person.

As for what conversion I would need, I need to convert from Shift-JIS to UTF-8,
but since Shift-JIS doesn't even display correctly in N++, I'm just going to
use other applications for that kind of code. It seems that the Windows clipboard
quietly does the necessary conversion when I copy/paste, so I can take advantage
of that. For example, I can view a Shift-JIS webpage in Firefox, highlight
and copy something, and paste it into an online HTML editor on a UTF-8 webpage
(which is at a basic level just a textarea in an HTML form), and it works fine.
So I guess I'll just do that, and avoid using N++ for editing things that have
Shift-JIS text (from old websites or text documents that were created on my
local computer in Notepad or such, because they are saved in Shift-JIS also).
If someone (like yourself?) decides to work on the Encodings plugin, that would
be great, but I'm not going to hold my breath.

______________________________________________________________________
You are receiving this email because you elected to monitor this forum.
To stop monitoring this forum, login to SourceForge.net and visit:
https://sourceforge.net/forum/unmonitor.php?forum_id=331754

SourceForge.net

2008-08-28 15:12:31 UTC

Permalink

Read and respond to this message at:
https://sourceforge.net/forum/message.php?msg_id=5203320
By: harrybharry

Atleast windows XP should provie an option to change the ANSI codepage (if you
run 9x, well, its not even worth trying :)), since NT windows is internally
unicode so a switch shouldnt cause to many problems, except ofcourse for ANSI
applications. Its hidden well away and requires admin priviliges I think and
a reboot, so Ill leave the option out :).

Notepad++ not properly displaying Shift-JIS sounds like a serious bug to me,
afaik Don tried to get such codepages to work. Would you have an example file
somewhere to try it out on if I can manage to do so?

Maybe I can pick Encodings up, sometimes I feel the current options are a bit
too limited, though since the addition of Convert To options its been remedied
alot. Atleast it sounds worth the while. I'll see.

for example, regular Notepad seem to correctly handle whatever I throw at it

I assume that does not include russian ;)

______________________________________________________________________
You are receiving this email because you elected to monitor this forum.
To stop monitoring this forum, login to SourceForge.net and visit:
https://sourceforge.net/forum/unmonitor.php?forum_id=331754

SourceForge.net

2008-08-29 00:12:16 UTC

Permalink

Read and respond to this message at:
https://sourceforge.net/forum/message.php?msg_id=5204488
By: osakawebbie

Post by SourceForge.net
Atleast windows XP should provie an option to change the ANSI codepage (if

you run 9x, well, its not even worth trying :))
In case you are wondering, yes, I'm running XP. But I would have no idea even
where to look for such an option. Plus, the only thing I'd be interested in
changing it to is UTF-8, but it doesn't seem that MS defines that is "ANSI".
And it would appear that what N++ calls ANSI is single-byte only. (I don't
really understand the parameters of the term ANSI these days - it's clearly
broader than just things approved as ANSI standards.
http://codesnipers.com/?q=node/34 highlights the confusion.)

Post by SourceForge.net
Would you have an example file somewhere to try it out on if I can manage

to do so?
Sure. Just go to http://budounoki.org, View Source, and copy/paste into N++.
(Or actually, you don't even need to view source - just copy some of the Japanese
straight off the page.)

But I just discovered something! :) When I just start a new document and paste
(or open a document on my harddisk that is Shift-JIS), N++ initially tries to
encode it in "ANSI", it's garbled. And when I tried this previously, first
I tried the other "Encode in..." options, which just garbled it in different
ways, and then I tried the "Convert to..." options and none worked. This time,
I pasted it in, and then immediately tried "Convert to UTF-8". That worked!
(Apparently trying other encodings first causes the underlying codes to change
in some way.) It also works on documents from my harddisk. Or, if I'm pasting,
I can start a new document and change the Format to either "Convert to UTF-8"
or "Encode in UTF-8" before pasting, and it will work. I have no idea whether
I should include the BOM or not, but that's a different question (for PHP and
HTML web pages, what do you think?).

Post by SourceForge.net

for example, regular Notepad seem to correctly handle whatever I throw at

Post by SourceForge.net
I assume that does not include russian ;)

True - I don't throw anything but Japanese and English at it, and since my OS
is Japanese, it's not surprising that it works.

______________________________________________________________________
You are receiving this email because you elected to monitor this forum.
To stop monitoring this forum, login to SourceForge.net and visit:
https://sourceforge.net/forum/unmonitor.php?forum_id=331754

SourceForge.net

2008-08-29 16:23:08 UTC

Permalink

Read and respond to this message at:
https://sourceforge.net/forum/message.php?msg_id=5206382
By: harrybharry

Yeah with ANSI I mean anything non UCS-2/UTF-16 or w/e Unicode encoding using
multiple bytes for each character. I know its a common cause of confusion but
I still have to find the term that fits all :)

Windows is capable if internally converting clipboard data to unicode afaik,
and Scintilla makes use of that if encoding in UTF-8. I think thats why setting
to UTF8 beofre pasting works so well (that is, if windows is set to the codepage
you're copying).
Copy-pasting it from my computer doesnt work so well (questionmarks), but just
saving the file and then openinbg clearly shows DBCS characters, which I cant
convert to UTF8 properly.
With garbled, what do you mean exactly (wrong glyph for a single character or
just a character being split into two different unrelated characters)?

As for BOM, I believe its valid to add one in HTML, PHP too maybe and I think
itll help many clients parse the data properly.

I tried looking at the Encodings plugin, but its not exactly my coding style
and thus hard to read, but im still going to write one that just does simple
conversions from installed codepages

______________________________________________________________________
You are receiving this email because you elected to monitor this forum.
To stop monitoring this forum, login to SourceForge.net and visit:
https://sourceforge.net/forum/unmonitor.php?forum_id=331754

SourceForge.net

2008-09-13 02:45:21 UTC

Permalink

Read and respond to this message at:
https://sourceforge.net/forum/message.php?msg_id=5265177
By: osakawebbie

HarryBHarry, sorry - I just now noticed that you had a question in your post
- I missed that before.

Post by SourceForge.net
With garbled, what do you mean exactly (wrong glyph for a single character

or just a character being split into two different unrelated characters)?

Each multibyte character being split into two singles and rendered accordingly
- upper 128 ASCII characters like European vowels, upside-down question marks,
etc. Then if I try to change to UTF-8 after the fact, they change to black
squares with the hex code for each byte in them. Cute, but not very useful.

About half of what I use N++ for is related to Joomla, and I recently noticed
that they recommended that any edited files be in UTF-8 without BOM, so that
answers my BOM question. (The other half is my own applications, and for them
I don't care.)

Since I'm migrating any old Shift-JIS code to UTF-8 anyway, I'm okay if I remember
to change the new open file to UTF-8 before I paste or type. I just wish I
could make UTF-8 the default instead of ANSI - everything I do is UTF-8, and
as I said, Joomla wants me to save my files in UTF-8 sans BOM even if they don't
contain any multibyte characters. Is there the possibility of changing the
default encoding in N++ for new files and files that when opened are not detected
as being anything in particular? A registry entry, perhaps?

______________________________________________________________________
You are receiving this email because you elected to monitor this forum.
To stop monitoring this forum, login to SourceForge.net and visit:
https://sourceforge.net/forum/unmonitor.php?forum_id=331754

SourceForge.net

2008-09-13 09:54:33 UTC

Permalink

Read and respond to this message at:
https://sourceforge.net/forum/message.php?msg_id=5266486
By: harrybharry

Notepad++ has no option to choose which default from ansi or utf-8 w/o bom,
you have to manually select which one you want. I find it typical it doesnt
render the text correctly, but when you convert it it does know how to d it,
but if that solves the problem I guess theres not much more to do, since a plugin
would have to do the exact same thing.

______________________________________________________________________
You are receiving this email because you elected to monitor this forum.
To stop monitoring this forum, login to SourceForge.net and visit:
https://sourceforge.net/forum/unmonitor.php?forum_id=331754