Skip to content

Towards handling Import[xxx, "ZIP", member-name]#1846

Open
rocky wants to merge 18 commits into
masterfrom
handle-import-zip
Open

Towards handling Import[xxx, "ZIP", member-name]#1846
rocky wants to merge 18 commits into
masterfrom
handle-import-zip

Conversation

@rocky

@rocky rocky commented Jun 27, 2026

Copy link
Copy Markdown
Member

Plumbing hooked up for Import zip with members.

Now we just gotta get ImportString[xxx, "JSON"] working on its own.

Also, other small fixes and improvements.

rocky added 9 commits June 26, 2026 14:42
Also go over a couple of SystemFiles/Formats

Better align "nffil" message with WMA
Align better with WMA, go over Formats/Image code

Small corrections.

In pytest stdout, if got == expected we don't need to print both, one is sufficient along
the the fact that the test passed.
This simplifies deleting (temporary) files and picking out the file extension
in implementing Import as well as in other builtins.
This is too complicated because it handles too many disparate forms.
pylint thinks this function is too complex too.
Instead, divide it up, and improve it.
Now we just gotta get ImportString[xxx, "JSON"] working on its own.

Also, other small fixes and improvements.
@rocky rocky force-pushed the handle-import-zip branch from ef07a43 to 968613e Compare June 27, 2026 00:39
"""%(name)s[text_String, element_]"""
# FIXME?: right now we aren't using element. Things might be
# more efficient if we used element?
return self.eval(text, evaluation)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mmatera Is there a better way to combine this with the def eval() above?

Even if that is the case, it might be useful to have this broken out as a stub for when this is revised to be able to handle the element passed.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not using the parameter element. In a proper implementation, this function should be more general than self.eval. Also, parse_html should have an extra attribute to filter a given element.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not using the parameter element.

That is exactly what FIXME says.

In a proper implementation, this function should be more general than self.eval. Also, parse_html should have an extra attribute to filter a given element.

Yep. Revising HTML and XML is left for later. I will be happy when we are able to "Import" and extract a JSON file from a ZIP import which is needed for being able to install paclets from the public paclet server.

This is the main reason why any work on this is currently being done.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, but the output with the second element produce something different.

In[8]:= Import["http://www2.fisica.unlp.edu.ar/index.html",{"Elements"}]        

Out[8]= {Data, FullData, Hyperlinks, ImageLinks, Images, Plaintext, Source, 
 
>    Title, XMLObject}

In[9]:= Import["http://www2.fisica.unlp.edu.ar/index.html",{"Title"}]           

Out[9]= Dto. de Fisica | Facultad de Cs. Exactas | UNLP

In[10]:= Import["http://www2.fisica.unlp.edu.ar/index.html",{"FullData"}]       

Out[10]= {{{}, {}, {}}, {{}, {}, {}, {}}, 
 
>    {{}, {{ }, { CC 67, 1900 La Plata, Argentina - Tel: +54-221-4246062 Fax:\
 
>        +54-221-4252006 - secre@fisica.unlp.edu.ar }}}, {{}}, 
 
>    {{}, {}, {}, {}, {}, {}, {}, {}, {}, {}}, {}}


In[11]:= Import["http://www2.fisica.unlp.edu.ar/index.html",{"Plaintext"}]      

Out[11]= Departamento de Física
           Facultad de Ciencias Exactas ,  Universidad Nacional de La Plata  
            
             
              CC 67, 1900 La Plata, Argentina - Tel: +54-221-4246062
          Fax: +54-221-4252006 - secre@fisica.unlp.edu.ar    
           Ciclo anual de charlas para alumnos  
           Ingreso a Webmail
           AFA Filial La Plata
           Biblioteca y Hemeroteca
          Museo de Física
           Instituto de Física La Plata

In[12]:= Import["http://www2.fisica.unlp.edu.ar/index.html",{"XMLObject"}]      

Out[12]= XMLObject[Document][{XMLObject[Declaration][Version -> 1.0, 
 
>      Standalone -> yes]}, XMLElement[html, 
 
>     {{http://www.w3.org/2000/xmlns/, xmlns} -> 
 
>       http://www.w3.org/1999/xhtml}, 
 
>     {XMLElement[head, {}, {XMLElement[link, 
 
>         {rel -> stylesheet, href -> dropdown.css, type -> text/css}, {}], 
 
>        XMLElement[script, {language -> JavaScript1.2, src -> menu_data.js}, 
 
>         {}], XMLElement[title, {}, 
 
>         {Dto. de Fisica | Facultad de Cs. Exactas | UNLP}]}], 
 
>      XMLElement[body, {link -> #428266, vlink -> #428266, 
 
>        bgcolor -> #ffffff}, {, 

 
>        XMLElement[font, {face -> verdana,arial,helvetica, size -> -1}, 
 
>         {, XMLElement[font, {size -> +2}, {Departamento de Física}], 


 
>          XMLElement[br, {clear -> none}, {}], , 

 
>          XMLElement[a, {shape -> rect, 
 
>            href -> http://www.exactas.unlp.edu.ar}, 
 
>           {Facultad de Ciencias Exactas}], ,, 

 
>          XMLElement[a, {shape -> rect, href -> http://www.unlp.edu.ar}, 
 
>           {Universidad Nacional de La Plata}], }], 

 
>        XMLElement[p, {}, {, XMLElement[script, 


 
>           {language -> JavaScript1.2, src -> menu_script.js}, {}], , 


 
>          XMLElement[table, {border -> 0}, 
 
>           {XMLElement[tr, {}, 
 
>             {XMLElement[td, {colspan -> 1, rowspan -> 1}, 
 
>               { , XMLElement[img, {src -> fisica.jpg}, {}],  }]}], 
                                                               
 
>            XMLElement[tr, {}, 
 
>             {XMLElement[td, {colspan -> 1, rowspan -> 1}, 
 
>               {, XMLElement[center, {}, 

 
>                 {, XMLElement[font, {size -> -3}, 

 
>                   {                                                      , 
                     CC 67, 1900 La Plata, Argentina - Tel: +54-221-4246062
 
>                    XMLElement[br, {clear -> none}, {}], 
 
>                                           , 
                     Fax: +54-221-4252006 - 
 
>                    XMLElement[a, 
 
>                     {shape -> rect, 
 
>                      href -> mailto:secre@fisica.unlp.edu.ar}, 
 
>                     {secre@fisica.unlp.edu.ar}], }]}]}]}]}], }], 


 
>        XMLElement[p, {}, {, XMLElement[a, 

 
>           {shape -> rect, href -> semin/alumnos}, 
 
>           {Ciclo anual de charlas para alumnos}], }], 


 
>        XMLElement[p, {}, {, XMLElement[a, 

 
>           {shape -> rect, href -> http://mail.fisica.unlp.edu.ar}, 
 
>           {Ingreso a Webmail}],  , XMLElement[br, {clear -> none}, {}], , 

 
>          XMLElement[a, {shape -> rect, 
 
>            href -> http://www2.fisica.unlp.edu.ar/filial/}, 
 
>           {AFA Filial La Plata}],  , XMLElement[br, {clear -> none}, {}], , 

 
>          XMLElement[a, {shape -> rect, 
 
>            href -> http://biblio.fisica.unlp.edu.ar/}, 
 
>           {Biblioteca y Hemeroteca}],  , 
 
>          XMLElement[br, {clear -> none}, {}],  , 

 
>          XMLElement[a, {shape -> rect, 
 
>            href -> http://museofisica.exactas.unlp.edu.ar/}, 
 
>           {Museo de Física}], XMLElement[br, {clear -> none}, {}], , 

 
>          XMLElement[a, {shape -> rect, 
 
>            href -> http://iflp.fisica.unlp.edu.ar/}, 
 
>           {Instituto de Física La Plata}], 
 
>          XMLElement[br, {clear -> none}, {}], }], XMLElement[p, {}, {}]}]}], 
 
>    {}]

In the Mathics3 master branch, this seems to work:

In[1]:= Import["http://www2.fisica.unlp.edu.ar/index.html",{"Title"}]
Out[1]= "Dto. de Fisica | Facultad de Cs. Exactas | UNLP"

In[2]:= Import["http://www2.fisica.unlp.edu.ar/index.html",{"Plaintext"}]
Out[2]= "Dto. de Fisica | Facultad de Cs. Exactas | UNLP
        Departamento de Física
        Facultad de Ciencias Exactas
        ,
        Universidad Nacional de La Plata
        CC 67, 1900 La Plata, Argentina - Tel: +54-221-4246062
        Fax: +54-221-4252006 -
        secre@fisica.unlp.edu.ar
        Ciclo anual de charlas para alumnos
        Ingreso a Webmail
        AFA Filial La Plata
        Biblioteca y Hemeroteca
        Museo de Física
        Instituto de Física La Plata"

So I do not see what this new method provides.

@rocky rocky Jun 27, 2026

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, but the output with the second element produce something different.

I don't see that. Below, I see lots of output, and I believe this is the same that you'd get from Wolframscript. The formatting of the text is different, but that's to be expected until we match up StandardForm output better.

If there is a specific difference, exactly what's different?

(Please try to give a small example of a difference.)

So I do not see what this new method provides.

It is a placeholder function (that indicates FIXME) and it is there to indicate that it should be filled out to remove gross inefficiency that can arise by reading in lots of stuff and then throwing away or filtering most of it.

Instead, that code should be filled out to pass information to other eval routines that handle element retrieval in a better way.


return parse_html(source, text, evaluation)

def eval_with_element(self, text, element, evaluation: Evaluation):

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rocky rocky changed the title Handle Import[xxx, "ZIP", member-name] Towards handling Import[xxx, "ZIP", member-name] Jun 27, 2026
@rocky rocky force-pushed the handle-import-zip branch 3 times, most recently from aaca4bb to 196e170 Compare June 27, 2026 21:55
(I wasn't aware of this work until late on.)
@rocky rocky force-pushed the handle-import-zip branch from 196e170 to 5dbe72d Compare June 27, 2026 22:17
By renaming the file, we make it possible to syntax check via running
"python jsonformat.py". "python json.py" gives a module ambiguity warning because
"impor json" is used in the file.
@rocky rocky force-pushed the handle-import-zip branch from 693c37c to d67cf1d Compare June 27, 2026 22:33
@rocky rocky force-pushed the handle-import-zip branch from 4634acd to 1a4a319 Compare June 27, 2026 22:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants